malariagen / fits

File tracking system for group DK
0 stars 0 forks source link

vw_pivot_file.alignment_filter column sometimes different from IRODS #18

Closed tnguyensanger closed 6 years ago

tnguyensanger commented 6 years ago

Sometimes the alignment_filter is empty in FITS, but populated in IRODS.

EG) FIts query:

SELECT * FROM mm6_fits.vw_pivot_file where full_path like '%_human.bam' and alignment_filter is null

# full_path, alignment_filter
/seq/245/245_3_nonhuman.bam, 
/seq/245/245_5_nonhuman.bam, 
/seq/245/245_6_nonhuman.bam, 
/seq/245/245_7_nonhuman.bam, 
/seq/245/245_8_nonhuman.bam, 
/seq/368/368_3_nonhuman.bam, 
/seq/368/368_5_nonhuman.bam, 
/seq/368/368_6_nonhuman.bam, 
/seq/368/368_7_nonhuman.bam, 
/seq/368/368_8_nonhuman.bam, 
/seq/531/531_5_nonhuman.bam, 
/seq/531/531_6_nonhuman.bam, 
/seq/585/585_1_nonhuman.bam, 
/seq/585/585_2_nonhuman.bam, 
/seq/585/585_3_nonhuman.bam, 
/seq/585/585_5_nonhuman.bam, 

IRODS query:

$ imeta ls -d /seq/245/245_3_nonhuman.bam alignment_filter 
AVUs defined for dataObj /seq/245/245_3_nonhuman.bam:
attribute: alignment_filter
value: nonhuman
units: 
podpearson commented 6 years ago

@tnguyensanger I've not looked at the alignment_filter before. Do you know what this means? What are you using it for?

tnguyensanger commented 6 years ago

@podpearson I haven't read any official documentation for it, but inferring from its usage, it specifies the reference the reads were aligned against. It only seems to exist for plasmodium, anopheles samples. It doesn't exist for the combined Human/plamodium samples (eg /seq/15235/15235_2.cram ). I don't know what's more reliable, using the alignment_filter column or simply filtering out any filenames with phix or _human in the filename.

magnusmanske commented 6 years ago

This was a deep one...

alignment_filter was not set because it didn't appear in the baton JSON.

It didn't appear in the baton JSON because I didn't have the baton JSON stored.

I didn't have the baton JSON stored because the samples for these files had not Multi-LIMS warehouse ID.

Added MLWH IDs, got the baton iRODs, populated alignment_filter. Your original query now returns 0 results. As things should be.

podpearson commented 6 years ago

@tnguyensanger to review

tnguyensanger commented 6 years ago

@magnusmanske Thanks for fixing this Magnus! Would it be possible to get a link to the commit that addresses the issue? Ad-hoc manual testing shows the issue has been resolved.

magnusmanske commented 6 years ago

The commit would be: https://github.com/malariagen/fits/commit/08d87ceaf8c34e396b345a6ce8a328c624d068ab