Open andrew-weisman opened 11 months ago
Is the suggestion here that we perform the 'species_name_short' and 'species_name_long' regex actions within the dataset_formats function. Presently I am using the functions within the Basic Phenotyper Library (BPL) here.
Notice also, I've added a column here in addition to 'species_name_short' and 'species_name_long' that makes the above mentioned filtering very easy. I just treat it like any other dataframe feature
Thanks @djsmith17. If I understand what you're saying correctly, I mean to do it in dataset_formats.py where it can be done as soon as possible so no extra rows are ever carried around as best as possible. It could be optional, or maybe optional to keep all-negative markers. Can you think of any situations aside from multiaxial gating where we'd ever want these all-negative objects?
Note another reason to do this is so there's no "Other" species in the exported phenotype assignments (particularly as generated by the SIT from the Phenotyper's data), though I suppose I could just not export the "Other" row in that dataframe.
@djsmith17 I'd probably be fine with doing this in each app separately. It's always done in the SIT. I could see value in not doing it in dataset_formats.py so that in the Gater and Phenotyper you can learn about all objects even if they're not real cells. I would implement this by default in the neighborhood profiler and then give them an option to keep them in the analysis if desired. That would also speed up that app by default. I'd think in general that if the user did not explicitly define phenotypes in the Gater or Phenotyper, then they generally do not want to analyze everything else as an additional phenotype; they just want to analyze the ones they explicitly defined.
Probably should be done in dataset_formats.py