ohsu-comp-bio / dms-es

elastic search for federated data management DEPRECATED - see intel repo
0 stars 1 forks source link

Logstash Project Organization and Standard File Names #1

Open jacmarjorie opened 8 years ago

jacmarjorie commented 8 years ago

As a Variant Store consumer of a project's individual, sample, and resource (MAF) tsv's I need a consistent naming and organization to simplify access to each of these project's TSVs in order to build a CallSet resource that is identifiable by ES and read into the Kibana interface.

Right now, this is the structure I see for project organization in the logstash directory:

└── logstash ├── README.md ├── baml │   ├── BeatAML_Data_Source.xlsx │   ├── BeatAML_rnaseq_2015_10_20_public_dashboard.xlsx │   ├── BeatAML_seqcap_2015_10_20_public_dashboard.csv │   ├── BeatAML_seqcap_2015_10_20_public_dashboard.tsv │   ├── BeatAML_seqcap_2015_10_20_public_dashboard.xlsx │   ├── Diagnosis_Labs_Treatments_Outcomes_2015_10_27.xlsx │   ├── baml-dataset.conf │   ├── baml-individual.conf │   ├── baml-resource.conf │   ├── baml-specimen.conf │   ├── dataset.tsv │   ├── default_index_template.json │   ├── dna-resources.tsv │   ├── individual.tsv │   ├── resource.tsv │   └── specimen.tsv ├── denormalizer │   ├── Dockerfile │   ├── README.md │   ├── aggregated_resource.py │   └── requirements.txt ├── example-data │   ├── README.md │   └── icgc │   └── icgc-dataset-1446768311064 │   ├── donor.tsv │   ├── donor_exposure.tsv.gz │   ├── donor_family.tsv.gz │   ├── donor_therapy.tsv.gz │   ├── sample.tsv │   ├── specimen.tsv │   ├── ssm_open-truncate.tsv │   └── ssm_open.tsv.gz ├── icgc │   ├── default_index_template.json │   ├── donor.tsv │   ├── icgc-donor.conf │   ├── icgc-resource.conf │   ├── icgc-sample.conf │   ├── icgc-specimen.conf │   ├── sample.tsv │   ├── specimen.tsv │   └── ssm_resources.tsv ├── june_demo │   ├── Austin │   │   ├── genome_data.tsv │   │   ├── images.tsv │   │   ├── projects.tsv │   │   └── specimens.tsv │   ├── OHSU-Data.xlsx │   ├── Portland │   │   ├── genome_data.tsv │   │   ├── images.tsv │   │   ├── projects.tsv │   │   └── specimens.tsv │   ├── ThirdSite │   │   ├── genome_data.tsv │   │   ├── images.tsv │   │   ├── projects.tsv │   │   └── specimens.tsv │   ├── datasets.tsv │   ├── deleteIndexes.sh │   ├── individual.tsv │   ├── ohsu-datasets.conf │   ├── ohsu-individual.conf │   ├── ohsu-specimens.conf │   └── specimens.tsv └── start_logstash.sh

jacmarjorie commented 8 years ago

Ideally, the MAF resource TSV would have the following columns from which I could inherit the columns I need (naming conventions aside):

individual_id normal_sample_id tumor_sample_id maf_resource_id