malariagen / fits

File tracking system for group DK
0 stars 0 forks source link

Write document describing how to create a build manifest from the database #14

Open magnusmanske opened 6 years ago

magnusmanske commented 6 years ago

First attempt here.

podpearson commented 6 years ago

Thanks @magnusmanske , I think this is very useful, though is somewhat different to what I was expecting. Comments below.

I've decided to reopen this as I think we should only close once we have a signed-off version (I'll close #15 now).

podpearson commented 6 years ago

regarding Duplicate of qc_complete; not sure what this is supposed to be - description is not a duplicate of qc_complete, it is mlwh.iseq_run_status_dict.description (which should contain "qc complete" for all samples). Not vital at this stage though.

magnusmanske commented 6 years ago

Now here

podpearson commented 6 years ago
magnusmanske commented 6 years ago

@podpearson "feels like a description of the database" - I think the difference here is that the descriptions and commands in the document are specifically designed for manifest creation, whereas the database documentation gives more of a general overview.

"manifest for a given set of samples" That's in there, see "Likewise, to get all FITS samples that have one of your favourite Oxford codes (tag_id=3561)". You'll have to combine that with the manifest generation code below; hence, "This tutorial assumes a basic knowledge of MySQL syntax."

podpearson commented 5 years ago

"manifest for a given set of samples" That's in there, see "Likewise, to get all FITS samples that have one of your favourite Oxford codes (tag_id=3561)". You'll have to combine that with the manifest generation code below; hence, "This tutorial assumes a basic knowledge of MySQL syntax."

My reading of this is that this gives you all the FITS samples for one single Oxford code, but the more common use case I think is getting all data a list of Oxford codes.

magnusmanske commented 5 years ago

A list of all Oxford codes can be retrieved via SELECT DISTINCTvalueFROM vw_sample_tag WHERE tag_id=3561. You can, of course, specify this further. "This tutorial assumes a basic knowledge of MySQL syntax."

podpearson commented 5 years ago

@magnusmanske - I've just made a pull request with two suggested minor typos.

"There can only ever be one row for a tag/(sample/file)/value combination". Ids 3689178 and 7426287 in sample2tag both have the same values of sample_id (190), tag_id (3604) and value (1195-PF-TRAC2-DONDORP). Is this a mistake?

This issue should be left open until we have sign-off, i.e. agreement at a production meeting that this is good to go.

podpearson commented 5 years ago

Regarding the "Complex whitelist/blacklist filter" in the where clause, can you remind me exactly what these lists were? Are they essentially the diff between what taxon mlwh thinks files are, and what species we have them according to Solaris study_group? If so, I think a cleaner way to address this would be to have the taxon tag populated with information from Solaris study_group where this exists, but from mlwh taxon otherwise. Has this already been done?