malariagen / fits

File tracking system for group DK
0 stars 0 forks source link

Determine files for which we have information only from iRODS #58

Open podpearson opened 5 years ago

podpearson commented 5 years ago

There is some debate about whether we need to pull in information from iRODS (see comments at https://github.com/malariagen/fits/issues/56). There are some files in the current version of FITS for which the only information we have is from iRODS. An example is 5528_5_human.bam, however, in this specific case, we don't have access to the file and this presumably contains the reads that map to the human reference, and hence we shouldn't have access to this.

In order to make the decision on whether we need to pull in data from iRODS, it would be useful to know if there are files which we think should be in FITS, but which are only accessible by querying iRODS/baton. To do this we would first need a list of such files, which is the purpose of this issue.

Once the list is created, I would suggest creating a separate issue to analyse the list. I think this analysis is currently lower priority than the work I suggested in https://github.com/malariagen/fits/issues/56

podpearson commented 5 years ago

I am assuming this issue will be resolved by comparing the outputs of the MLWH/Subtrack-only import (#62 ) with the previous version of FITS that also included information from iRODS/baton.