sanger-tol / treeval

Pipelines for the production of Treeval data
https://pipelines.tol.sanger.ac.uk/treeval
Other
21 stars 2 forks source link

Generalise yaml values #168

Closed DLBPointon closed 9 months ago

DLBPointon commented 10 months ago

Description of the bug

Currently there are a few variables which would make no sense to an outside eye.

gevalType being one, this should be replaced with ProjectID or TicketType

Add readType ('hifi','clr','ont','illumina') as new value - to be used in read coverage - see ASCC.

Command used and terminal output

No response

Relevant files

No response

System information

No response

DLBPointon commented 9 months ago

proposed structure:

assembly:
  assem_level: scaffold
  assem_version: 1
  sample_id: Oscheius_DF5033
  latin_name: to_provide_taxonomic_rank
  defined_class: nematode
  project_id: DTOL
reference_file: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/assembly/draft/DF5033.hifiasm.noTelos.20211120/DF5033.noTelos.hifiasm.purged.noCont.noMito.fasta
assem_reads:
  longread_type: hifi
  longread_data: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/genomic_data/nxOscSpes1/pacbio/fasta/
  hic_data: /lustre/scratch123/tol/resources/treeval/treeval-testdata/TreeValSmallData/Oscheius_DF5033/genomic_data/nxOscSpes1/hic-arima2/full/
  supplementary_data: path
alignment:
  data_dir: /lustre/scratch123/tol/resources/treeval/gene_alignment_data/
  common_name: "" # For future implementation (adding bee, wasp, ant etc)
  geneset_id: "OscheiusTipulae.ASM1342590v1,CaenorhabditisElegans.WBcel235,Gae_host.Gae"
  #Path should end up looking like "{data_dir}{classT}/{common_name}/csv_data/{geneset}-data.csv"
self_comp:
  motif_len: 0
  mummer_chunk: 10
intron:
  size: "50k"
telomere:
  teloseq: TTAGGG
synteny:
  synteny_genome_path: /lustre/scratch123/tol/resources/treeval/synteny/
busco:
  lineages_path: /lustre/scratch123/tol/resources/busco/v5
  lineage: nematoda_odb10