pepkit / peppy

Project metadata manager for PEPs in Python
https://pep.databio.org/peppy
BSD 2-Clause "Simplified" License
37 stars 12 forks source link

Make a peppy.Project from a sample yaml file #458

Closed donaldcampbelljr closed 7 months ago

donaldcampbelljr commented 7 months ago
          example: I would like to make a peppy.Project from a sample yaml file:
- sample_name: sample 1
  file: path/to/file.tsv
- sample_name: sample 2
  file: path/to/2.tsv
prj = peppy.Project(sample_yaml="path.yaml")

or `prj = peppy.Project(sample_yaml={ ... sample dict ... })

Originally posted by @nsheff in https://github.com/pepkit/peppy/issues/457#issuecomment-1841195131

khoroshevskyi commented 7 months ago

As far as I know, peppy doesn't have this functionality.

But there is workaround, you can use from_dict:

project_dict = read_yaml(...)
# insesrt project_dict to this function:
prj = Project().from_dict({'_sample_df': dict,
                                        '_config': dict,
                                        '_subsample_list':list[dict],
                                        'name': str,
                                        'description': str}
)
khoroshevskyi commented 7 months ago

@nsheff could you provide example yaml file that can be provided as input to this function?

nsheff commented 7 months ago

Here's another example:

- assembly: HG01891.alt.pat.f1_v2
  population: African Caribbean In Barbados
  assembly_accession: GCA_018467165.1
  assembly_link: https://www.ebi.ac.uk/ena/browser/view/GCA_018467165.1
  assembly_submitter: UCSC Genomics Institute
  annotation_gtf: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018467165.1/ensembl/geneset/2022_07/Homo_sapiens-GCA_018467165.1-2022_07-genes.gtf.gz
  annotation_gff3: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018467165.1/ensembl/geneset/2022_07/Homo_sapiens-GCA_018467165.1-2022_07-genes.gff3.gz
  proteins: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018467165.1/ensembl/geneset/2022_07/Homo_sapiens-GCA_018467165.1-2022_07-pep.fa.gz
  transcripts: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018467165.1/ensembl/geneset/2022_07/Homo_sapiens-GCA_018467165.1-2022_07-cdna.fa.gz
  variants_clinvar: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018467165.1/ensembl/variation/2022_10/vcf/Homo_sapiens-GCA_018467165.1-2022_10-clinvar.vcf.gz
  variants_gnomad: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018467165.1/ensembl/variation/2022_10/vcf/Homo_sapiens-GCA_018467165.1-2022_10-gnomad.vcf.gz
  ftp_dumps: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018467165.1
  rapid_link: https://rapid.ensembl.org/Homo_sapiens_gca018467165v1/Info/Index
  file_name: Homo_sapiens-GCA_018467165.1-unmasked.fa.gz
  url: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018467165.1/ensembl/genome/Homo_sapiens-GCA_018467165.1-unmasked.fa.gz
  local_file: data/HG01891.alt.pat.f1_v2.unmasked.fa.gz
  remote_md5: f80e3ab39c3a3245cc7d3edadac1adfd
  fasta: analysis/data/HG01891.alt.pat.f1_v2.unmasked.fa.gz
- assembly: HG01258.pri.mat.f1_v2
  population: Colombian In Medellin, Colombia
  assembly_accession: GCA_018469405.1
  assembly_link: https://www.ebi.ac.uk/ena/browser/view/GCA_018469405.1
  assembly_submitter: UCSC Genomics Institute
  annotation_gtf: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018469405.1/ensembl/geneset/2022_07/Homo_sapiens-GCA_018469405.1-2022_07-genes.gtf.gz
  annotation_gff3: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018469405.1/ensembl/geneset/2022_07/Homo_sapiens-GCA_018469405.1-2022_07-genes.gff3.gz
  proteins: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018469405.1/ensembl/geneset/2022_07/Homo_sapiens-GCA_018469405.1-2022_07-pep.fa.gz
  transcripts: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018469405.1/ensembl/geneset/2022_07/Homo_sapiens-GCA_018469405.1-2022_07-cdna.fa.gz
  variants_clinvar: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018469405.1/ensembl/variation/2022_10/vcf/Homo_sapiens-GCA_018469405.1-2022_10-clinvar.vcf.gz
  variants_gnomad: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018469405.1/ensembl/variation/2022_10/vcf/Homo_sapiens-GCA_018469405.1-2022_10-gnomad.vcf.gz
  ftp_dumps: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018469405.1
  rapid_link: https://rapid.ensembl.org/Homo_sapiens_gca018469405v1/Info/Index
  file_name: Homo_sapiens-GCA_018469405.1-unmasked.fa.gz
  url: https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_018469405.1/ensembl/genome/Homo_sapiens-GCA_018469405.1-unmasked.fa.gz
  local_file: data/HG01258.pri.mat.f1_v2.unmasked.fa.gz
  remote_md5: cf7d737137c312357b409b962eea0494
  fasta: analysis/data/HG01258.pri.mat.f1_v2.unmasked.fa.gz