waldronlab / curatedMetagenomicDataCuration

Sample Metadata Curation for curatedMetagenomicData
https://waldronlab.io/curatedMetagenomicDataCuration/
28 stars 23 forks source link

change col.name variables #54

Closed schifferl closed 3 years ago

schifferl commented 3 years ago

To improve consistency in naming of variables across all parts of the pipeline and APIs we should make the following changes to the template (and curated files).

Current Name -> New Name dataset_name -> study_name sampleID -> sample_id subjectID -> subject_id

Currently, study_name is not present in curated files and I generate it in the R package – it should be added to them.

paolinomanghi commented 3 years ago

I changed the columns. I also changed the dataset_name to study_name. I wouldn't add the dataset name to the tables: it would be a) the same name of the table without "_metadata.tsv" b) a different name, generating a relevant confusion, possibly

what do you think?

jwokaty commented 3 years ago

I'm making this note here since this issue is not yet closed, but I can create new issues.

"dataset_name" gets added when using makeCombinedMetadata to construct combined_metadata and checkCuration uses the /inst/extdata/template.csv. I used checkCuration in cMD's github action to update combined_metadata and determine if the curation is passing or failing, so we should also open an issue in cMD to change the dataset_name column to study_name.

paolinomanghi commented 3 years ago

Ok. It's not clear to me indeed if we can close this issue or not (as now the template.csv has study_name). Please, close if you think it can be closed

schifferl commented 3 years ago

Thank you for making the changes @paolinomanghi, the reading in is now easier for me. From above, I am doing as you say in choice a), but it would be just the same to do it here. Go ahead, and do that if it is easy enough for you. The variable study_name would be the table name without "_metadata.tsv". The issues raised by @jwokaty are good points too, but we don't need makeCombinedMetadata anymore (this happens in cMD itself). So, ok to go ahead with the final change.

paolinomanghi commented 3 years ago

So, just to be clear: I don't modify the tables but I do switch from dataset_name to study_name in the grammar?

schifferl commented 3 years ago

Hi @paolinomanghi, I think change the grammar and change the tables. This keeps one more step in the curation repo, where it seems to fit.

paolinomanghi commented 3 years ago

I've pushed the changes plus some other very little corrections to the tables. I'm closing this right now.