Probably filter out all-negative markers in a datafile probably in all cases except for multiaxial gating

ncats / multiplex-analysis-web-apps

https://ncats.github.io/multiplex-analysis-web-apps/

1 stars 0 forks source link

Probably filter out all-negative markers in a datafile probably in all cases except for multiaxial gating #32

Open andrew-weisman opened 11 months ago

andrew-weisman commented 11 months ago

Probably should be done in dataset_formats.py

djsmith17 commented 11 months ago

Is the suggestion here that we perform the 'species_name_short' and 'species_name_long' regex actions within the dataset_formats function. Presently I am using the functions within the Basic Phenotyper Library (BPL) here.

Notice also, I've added a column here in addition to 'species_name_short' and 'species_name_long' that makes the above mentioned filtering very easy. I just treat it like any other dataframe feature

andrew-weisman commented 11 months ago

Thanks @djsmith17. If I understand what you're saying correctly, I mean to do it in dataset_formats.py where it can be done as soon as possible so no extra rows are ever carried around as best as possible. It could be optional, or maybe optional to keep all-negative markers. Can you think of any situations aside from multiaxial gating where we'd ever want these all-negative objects?

andrew-weisman commented 11 months ago

Note another reason to do this is so there's no "Other" species in the exported phenotype assignments (particularly as generated by the SIT from the Phenotyper's data), though I suppose I could just not export the "Other" row in that dataframe.

andrew-weisman commented 9 months ago

@djsmith17 I'd probably be fine with doing this in each app separately. It's always done in the SIT. I could see value in not doing it in dataset_formats.py so that in the Gater and Phenotyper you can learn about all objects even if they're not real cells. I would implement this by default in the neighborhood profiler and then give them an option to keep them in the analysis if desired. That would also speed up that app by default. I'd think in general that if the user did not explicitly define phenotypes in the Gater or Phenotyper, then they generally do not want to analyze everything else as an additional phenotype; they just want to analyze the ones they explicitly defined.