Open magnusmanske opened 6 years ago
Thanks @magnusmanske , I think this is very useful, though is somewhat different to what I was expecting. Comments below.
I've decided to reopen this as I think we should only close once we have a signed-off version (I'll close #15 now).
fits_create_manifest --sample_ids_file <sample_ids_filename>
or a web interface? I think I was expecting that the first release would have some way of creating manifests that didn't involve writing complex SQL queries. I think its fine if this isn't in scope for the first release, but I think we do need to state what the long-term goal is. If you already have ideas about what the long-term goal is, write them in here. If this still needs to be decided, have some statement here about the fact that this needs to be decided, and create a new issue to capture this that is referenced from here.regarding Duplicate of qc_complete; not sure what this is supposed to be
- description is not a duplicate of qc_complete, it is mlwh.iseq_run_status_dict.description (which should contain "qc complete" for all samples). Not vital at this stage though.
Now here
@podpearson "feels like a description of the database" - I think the difference here is that the descriptions and commands in the document are specifically designed for manifest creation, whereas the database documentation gives more of a general overview.
"manifest for a given set of samples" That's in there, see "Likewise, to get all FITS samples that have one of your favourite Oxford codes (tag_id
=3561)". You'll have to combine that with the manifest generation code below; hence, "This tutorial assumes a basic knowledge of MySQL syntax."
"manifest for a given set of samples" That's in there, see "Likewise, to get all FITS samples that have one of your favourite Oxford codes (tag_id=3561)". You'll have to combine that with the manifest generation code below; hence, "This tutorial assumes a basic knowledge of MySQL syntax."
My reading of this is that this gives you all the FITS samples for one single Oxford code, but the more common use case I think is getting all data a list of Oxford codes.
A list of all Oxford codes can be retrieved via SELECT DISTINCT
valueFROM vw_sample_tag WHERE tag_id=3561
. You can, of course, specify this further. "This tutorial assumes a basic knowledge of MySQL syntax."
@magnusmanske - I've just made a pull request with two suggested minor typos.
"There can only ever be one row for a tag/(sample/file)/value combination". Ids 3689178 and 7426287 in sample2tag both have the same values of sample_id (190), tag_id (3604) and value (1195-PF-TRAC2-DONDORP). Is this a mistake?
This issue should be left open until we have sign-off, i.e. agreement at a production meeting that this is good to go.
Regarding the "Complex whitelist/blacklist filter" in the where clause, can you remind me exactly what these lists were? Are they essentially the diff between what taxon mlwh thinks files are, and what species we have them according to Solaris study_group? If so, I think a cleaner way to address this would be to have the taxon tag populated with information from Solaris study_group where this exists, but from mlwh taxon otherwise. Has this already been done?
First attempt here.