malariagen / fits

File tracking system for group DK
0 stars 0 forks source link

Rewrite database description document #9

Open magnusmanske opened 6 years ago

magnusmanske commented 6 years ago

based on previous comments from Richard in email 28/06/2018 13:59

podpearson commented 5 years ago

Comments from email mentioned above are as follows:

I think this is a good start. I think what this particularly needs now is an introduction explaining why this is needed, what it is replacing, etc., and also pointers to what comes next. The descriptions of the tables and fields need expanding a bit - I think you should be able to understand every single field in the database using this document. I think we also need, either in this document, or as a separate document, details of the source data for this, how it is populated, etc.

Specifically I would suggest:

podpearson commented 5 years ago

@magnusmanske , could you talk to @sclaugoncalves to understand what the different library IDs available are, and ensure that the ones that get included in FITS are documented here?

Also, note that @alimanfoo has suggested (#6) using markdown for documentation, which I think is a good idea.

magnusmanske commented 5 years ago

I have ported the Google doc over to markdown, here.

I will incorporate some of the above suggestions. Note, however, that this is the database description document. It is not "FITS MVP", or FITS in general. It describes the database, not the philosophic rationale of having a file tracking system.

podpearson commented 5 years ago

I will incorporate some of the above suggestions. Note, however, that this is the database description document. It is not "FITS MVP", or FITS in general. It describes the database, not the philosophic rationale of having a file tracking system.

Yes, fair point. However, I think the comments above should be captured somewhere in the documentation. If you think this is not the right place, could you decide where is and document there?

magnusmanske commented 5 years ago

The document is now here

podpearson commented 5 years ago

In the following could you:

Comments on this document (note some of these were previously in the comment dated Sep 28 and they haven't been addressed in the latest version):

"It does not describe FITS in total". But we need this overview somewhere, right? Could you

magnusmanske commented 5 years ago

We already have "overview" and "mvp_v1" for, well, an overview. I have linked to those now. I don't see the point in Yet Another Document to duplicate that information.

I have added some field information to the SQL schema itself, where it does not appear relevant for the main document.

I would rather not add the database access details into a git doc/issue. That's just bad form.

magnusmanske commented 5 years ago

I'm not sure we need guidelines for the notes. They are, by definition, free-form. My guideline recommendation is "use common sense".

podpearson commented 5 years ago

We already have "overview" and "mvp_v1" for, well, an overview. I have linked to those now. I don't see the point in Yet Another Document to duplicate that information.

Sorry, I think I forgot there was already an overview document when writing this. The new links go to raw .md file rather than correctly rendered version - could this be fixed?

I have added some field information to the SQL schema itself, where it does not appear relevant for the main document.

OK, what might be useful is an example of how to access this information from the schema itself

I would rather not add the database access details into a git doc/issue. That's just bad form.

Fair point. How about including the details with the exception of the password and have a note saying "contact @magnusmanske for password" or similar?

magnusmanske commented 5 years ago

With this commit, I consider all points here addressed, and close the issue.

podpearson commented 5 years ago

@magnusmanske - I've just made a pull request with a few suggested small changes.

I also have a few follow on questions:

  1. "this should be done automatically now, but may be missing from early imports". I find this worrying (see also comments on process doc). Could you retrospectively apply this to older imports. Presumably you still know all of the imports that were done, right?

  2. file_relation table. Is there a convention here about whether the BAM or the CRAM is considered the "parent"? I'm wondering this in particular because I'm wondering which will get select in your code for creating a manifest when both are available.

  3. Why the need for sample.name? How might this be used? Has it been populated in a consistent way to date, e.g. does the sequencescape number represent anything specific?

Please address the above by making pull requests, and outlining answers to each of the above (including question number) either in the pull request comments, or else as comments in this issue.

This issue should be left open until we have sign-off, i.e. agreement at a production meeting that this is good to go.