riboviz / example-datasets

Example datasets to run with RiboViz
Apache License 2.0
2 stars 7 forks source link

riboviz.tools.upgrade_config_file: provenance placement #82

Open swinterbourne opened 3 years ago

swinterbourne commented 3 years ago

Issue linked to #76 and testing the riboviz.tools.upgrade_config_file tool from riboviz 2.1.

I used the command python -m riboviz.tools.upgrade_config_file -i ../example-datasets/fungi/saccharomyces/Lareau_2014_Replicates_RPF_3-samples_CDS_w_250utrs_config.yaml -o ../example-datasets/fungi/saccharomyces/Lareau_2014_Replicates_RPF_3-samples_CDS_w_250utrs_config_upgraded.yaml on the Lareau et al 2014 dataset to upgraded the config file.

The new config file is organised in alphabetical order, as a result the provenance has moved from the top of the config to the middle of the of the config file. I assume that we want to keep the provenance at the top of the config file?

Upgraded config_file:

adapters: CTGTAGGCACCATCAAT
asite_disp_length_file: data/yeast_standard_asite_disp_length.txt
buffer: 250
build_indices: true
codon_positions_file: data/yeast_codon_pos_i200.RData
count_reads: true
count_threshold: 64
dataset: L-Sc_2014
dedup_stats: false
dedup_umis: false
dir_in: L-Sc_2014/input
dir_index: L-Sc_2014/index
dir_out: L-Sc_2014/output
dir_tmp: L-Sc_2014/tmp
do_pos_sp_nt_freq: true
extract_umis: false
feature: CDS
features_file: data/yeast_features.tsv
fq_files:
  Replicate-1: SRR1363412.fastq.gz
  Replicate-2: SRR1363413.fastq.gz
  Replicate-3: SRR1363414.fastq.gz
group_umis: false
is_riboviz_gff: true
job_email: null
job_email_events: beas
job_memory: 8G
job_name: riboviz
job_num_cpus: 4
job_parallel_env: mpi
job_runtime: '48:00:00'
make_bedgraph: true
max_read_length: 50
min_read_length: 10
multiplex_fq_files: null
nextflow_dag_file: nextflow-dag.html
nextflow_report_file: nextflow-report.html
nextflow_timeline_file: nextflow-timeline.html
nextflow_trace_file: nextflow-trace.tsv
nextflow_work_dir: work
num_processes: 16
orf_fasta_file: ../../riboviz/example-datasets/fungi/saccharomyces/annotation/Saccharomyces_cerevisiae_yeast_CDS_w_250utrs.fa
orf_gff_file: ../../riboviz/example-datasets/fungi/saccharomyces/annotation/Saccharomyces_cerevisiae_yeast_CDS_w_250utrs.gff3
orf_index_prefix: yeast_CDS_w_250
output_pdfs: true
primary_id: Name
provenance:
  DOI: https://doi.org/10.7554/eLife.01257
  GEO: GSE58321
  date run: 2021-02-21
  notes: Replicates carried out by Gamble et al 2016 to confirm findings of ICPs from
    Jan et al 2014.
  reference: Distinct stages of the translation elongation cycle revealed by sequencing
    ribosome-protected mRNA fragments, Lareau et. al. 2014
  riboviz-version: 2.0 | COMMIT e94a909c70942cb428450a3b98edef1614143c18
  website: null
  yaml authors:
  - author: Felicity Anderson
    email: Felicity.Anderson@ed.ac.uk
  - author: Sophie Winterbourne
    email: '...'
publish_index_tmp: false
rpf: true
rrna_fasta_file: ../../riboviz/example-datasets/fungi/saccharomyces/contaminants/Saccharomyces_cerevisiae_yeast_rRNA_R64-1-1.fa
rrna_index_prefix: yeast_rRNA
run_static_html: true
sample_sheet: null
samsort_memory: 768M
secondary_id: null
skip_inputs: false
stop_in_feature: false
t_rna_file: data/yeast_tRNAs.tsv
trim_5p_mismatches: true
umi_regexp: null
validate_only: false
3mma-mack commented 3 years ago

I've encountered the same thing with the Duncan-Sp_2018 upgraded config file, it is all alphabetically ordered.

ewallace commented 3 years ago

@mikej888 let's discuss this at least briefly next dev meeting.

mikej888 commented 3 years ago

In Python 2.6+ dicts are have ordered keys by default. https://stackoverflow.com/questions/5121931/in-python-how-can-you-load-yaml-mappings-as-ordereddicts.

Commit 13943f6 in riboviz develop sets yaml.dump sort_keys=False.