legumeinfo / ongenome_schema

Database schema suitable for gene expression RNASeq data and other counts based data mapped to a reference genome
0 stars 0 forks source link

Add 'Key' Column to Samples Sheet #6

Closed sdash-github closed 7 years ago

sdash-github commented 7 years ago

Sample uniquenames have to match column headers in expression data file for loading. Data processing steps use the srr_run acc no for column headers (samples) because the SRA data files use this no as file names. So far, for loading, I was almost manually replacing the column headers in data files with sample uniquenames to match biomaterials related tables in Chado. Discussion with Connor: Add an extra 'key' column to the 'Samples' sheet that has these SRR numbers (duplication of 'sra_run' column as a 'key' col). This column would be referred to during loading to look up the correspondence between sample uniquename and its data column. Future proof: In case a dataset is not from SRA, we can use keys of our choice other than sra_run acc.

sdash-github commented 7 years ago

-- Added 'key' column to spread sheet. -- Added 2nd row in data file with just SRR number(key). Remove first row if you have to.

** I suppose it is now ready for loading.

sdash-github commented 7 years ago

Done and Connor used it to load expression data.