Open podpearson opened 5 years ago
@sclaugoncalves Do you foresee any issues with using always using alfresco study as the sequencescape study going forward?
@tnguyensanger no issues, we are already doing this for a while now...
@tnguyensanger, @podpearson we can also update old study names in sequencescape to map 1 to 1 to alfresco
@tnguyensanger no issues, we are already doing this for a while now...
Sweet :).
@sclaugoncalves @podpearson There are over 10000 samples in FITS which are missing an Alfresco Study but are marked as R&D. See attached file for full list.
There is an entry in /nfs/team112_internal/rp7/src/github/malariagen/SIMS/meta/mlwh/sequencescape_alfresco_study_mappings.txt
to map SequenceScape study <==> Alfresco study for "Team 112 R&D" <==> "1089-R&D".
Do we ever do R&D for other teams/labs?
How should we handle samples SequenceScape studies "Malaria Programme R&D" or "Malaria R&D" but empty Alfresco study?
To see which samples fall under this category, try running this query in FITS
select * from vw_pivot_sample where vw_pivot_sample.alfresco_study is null and vw_pivot_sample.sequenscape_study_name like '%R&D';
Thanks @sclaugoncalves and @tnguyensanger . I have been considering for a while whether we should apply all the "exceptions" I have found (see https://github.com/malariagen/SIMS/tree/master/meta/mlwh) back to sequencescape. The more I think about this the more I think it is what we should do. Before doing this, I think we first need to understand whether this information would then feed through to other systems, particularly mlwh, iRODS and subtrack. I think we would also need to think about what audit trails we might need. Any thoughts gratefully received!
Do we ever do R&D for other teams/labs?
Not that I'm aware of
How should we handle samples SequenceScape studies "Malaria Programme R&D" or "Malaria R&D" but empty Alfresco study?
I think in general these have empty Alfresco study because they are simply not associated with any Alfresco study. They are R&D samples, e.g. created by doing stuff in the lab to cultured lab strains, rather than samples received from partners.
@tnguyensanger yes, R&D samples are not assigned to an alfresco study...
In the past all groups in the programme were submitting R&D samples through our R&D study (Team 112) but when number of samples started to increase we decided to split it. All other groups now submit to study Malaria R&D, but I guess there still might be some old samples from other groups in our study. We can move those to the correct study.
@podpearson agree, we need to check with core if any change will pass on to all systems. As a small test, I requested core to correct the study name for study 1131 (your first comment here), I'll let you you know when it's done and we can then check it...
@sclaugoncalves , after you requested core to correct study name from IHTP_1131-PF-BN-BERTIN to 1131-PF-BJ-BERTIN, it seems that files have had the study name changed correctly in both mlwh and iRODS.
You had earlier requested that 5 samples (RCN03610, RCN06860, RCN06881, RCN06893, RCN06911) were moved from study 1195-PF-TRAC2-DONDORP to study 1180-PF-TRAC2-DONDORP. These changes don't appear to have propagated through, but this might be because the original study name was actually 1195-PF-TRAC2_DONDORP (note the underscore rather than hyphen before DONDORP). I'll forward on the original email about this.
For details see https://github.com/malariagen/fits/blob/master/work/44_populating_alfresco_study/20190123_check_if_study_was_changed_in_mlwh_and_irods.ipynb
In many cases there is a 1-to-1 mapping between sequencescape study and alfresco study. In some cases, the names are identical. In some cases, alfresco study can (and has) been inferred from sequencescape study name, (e.g. "IHTP_PWGS 1134-PF-ML-CONWAY" study is Alfresco study 1134-PF-ML-CONWAY). In some cases, the two are subtly different (IHTP_1131-PF-BN-BERTIN vs 1131-PF-BJ-BERTIN - note BN vs BJ). In some cases, domain knowledge is required (e.g. sequencescape study "Plasmodium HB3xDD2 progeny" maps to Alfresco study "1041-PF-US-FERDIG").
Rather than inferring based on some rule, a more complete and accurate method for populating "Alfresco study" from sequencescape study might be to use a mapping file. I previously created such a thing when I was building manifests. A symlink to the latest version can be found at
/nfs/team112_internal/rp7/src/github/malariagen/SIMS/meta/mlwh/sequencescape_alfresco_study_mappings.txt
.Could we consider incorporating such a mapping file into the process of populating the "Alfresco study" tags?