Open andrew-weisman opened 7 months ago
We will need to start prioritizing this issue soon (without it we can't reproduce the GMB results because the phenotypes are incorrect, at least when using the Phenotyper). Honestly should be straightforward and I may have even included the code that automatically does this when I initially sent over the original phenotyper code (I'll look into it first).
For the time being I can get that phenotyping right by using the old phenotype assignments file called in from the SIT but that completely bypasses the Phenotyper.
Note that since this is not implemented in the Phenotyper, I added a warning when pulling in data from the Phenotyper to the SIT, see 03a_Tool_parameter_selection.py in the function create_phenotype_assignments_file_from_phenotyper(). Basically, I'm not allowing in the SIT for the user to attempt to use compound phenotype identification which is specified in the phenotype assignments TSV file by separating multiple phenotypes by a hyphen. This only affects automatic TSV file creation by the Phenotyper; compound phenotypes are not allowed there. They are still allowed in a manually edited TSV file. See more details in the referenced comment.
E.g., if a user assigns A-B to a species, then duplication of the coordinates occurs so that there's one set for phenotype A and one set for phenotype B. The code for this is already in the SIT library. Not high priority right now.