Open christopher-hakkaart opened 2 years ago
A secondary objective will be to reserve global samplesheet headers
I did a really quick and dirty scrape of the schema.json
from the pipelines listed as released
on the website. 3 of those did not appear to have a schema.json
in master
and got skipped: mnaseseq
, imcyto
, slamseq
. This leaves 44
pipelines.
Here are all the params which appeared in more than one pipeline:
outdir 44
email 44
custom_config_version 44
custom_config_base 44
config_profile_description 44
config_profile_contact 44
config_profile_url 44
max_cpus 44
max_memory 44
max_time 44
help 44
email_on_fail 44
plaintext_email 44
monochrome_logs 44
tracedir 43
input 42
publish_dir_mode 42
max_multiqc_email_size 40
validate_params 40
show_hidden_params 40
config_profile_name 39
multiqc_config 38
multiqc_title 33
hook_url 29
igenomes_ignore 28
igenomes_base 27
genome 25
version 24
multiqc_logo 24
multiqc_methods_description 24
fasta 23
skip_multiqc 17
save_reference 15
aligner 14
gtf 13
enable_conda 13
hostnames 12
skip_fastqc 12
clip_r1 10
three_prime_clip_r1 10
clip_r2 9
three_prime_clip_r2 9
save_trimmed 9
gff 8
skip_trimming 8
trim_nextseq 7
star_index 7
seq_center 7
skip_qc 7
protocol 6
save_unaligned 6
save_align_intermeds 6
gene_bed 6
bwa_index 6
skip_preseq 5
single_end 5
enzyme 4
trim_fastq 4
star_ignore_sjdbgtf 4
read_length 4
skip_alignment 4
save_merged_fastq 4
blacklist 4
skip_igv 4
skip_peak_qc 4
macs_gsize 4
name 4
database 3
decoy_method 3
precursor_mass_tolerance 3
fragment_mass_tolerance 3
fixed_mods 3
variable_mods 3
min_peptide_length 3
max_peptide_length 3
num_hits 3
subset_max_train 3
klammer 3
description_correct_features 3
quantification_method 3
contrasts 3
singularity_pull_docker_container 3
bowtie2_index 3
save_trimmed_fail 3
skip_cutadapt 3
skip_markduplicates 3
skip_picard_metrics 3
seq_platform 3
keep_dups 3
deseq2_vst 3
skip_deseq2_qc 3
skip_peak_annotation 3
skip_plot_profile 3
root_folder 2
local_input_type 2
add_decoys 2
openms_peakpicking 2
peakpicking_inmemory 2
peakpicking_ms_levels 2
search_engines 2
num_enzyme_termini 2
allowed_missed_cleavages 2
precursor_mass_tolerance_unit 2
fragment_mass_tolerance_unit 2
fragment_method 2
isotope_error_range 2
instrument 2
min_precursor_charge 2
max_precursor_charge 2
max_mods 2
db_debug 2
enable_mod_localization 2
mod_localization 2
luciphor_neutral_losses 2
luciphor_decoy_mass 2
luciphor_decoy_neutral_losses 2
luciphor_debug 2
IL_equivalent 2
posterior_probabilities 2
pp_debug 2
FDR_level 2
train_FDR 2
test_FDR 2
outlier_handling 2
consensusid_algorithm 2
consensusid_considered_top_hits 2
min_consensus_support 2
protein_level_fdr_cutoff 2
protein_quant 2
mass_recalibration 2
transfer_ids 2
targeted_only 2
skip_post_msstats 2
ref_condition 2
enable_qc 2
ptxqc_report_layout 2
skip_pycoqc 2
skip_nanoplot 2
kraken2_db 2
skip_kraken2 2
skip_fastp 2
variant_caller 2
min_mapped_reads 2
mode 2
adapter_fasta 2
save_databases 2
transcript_fasta 2
salmon_index 2
tools 2
trim 2
fai 2
malt_mode 2
stranded 2
skip_quantification 2
skip_bigwig 2
peakcaller 2
annotation_tool 2
with_umi 2
umitools_dedup_stats 2
dragmap 2
skip_tools 2
split_fastq 2
no_intervals 2
snpeff_cache 2
vep_cache 2
dbsnp 2
dbsnp_tbi 2
dict 2
fasta_fai 2
known_indels 2
known_indels_tbi 2
mappability 2
snpeff_db 2
vep_genome 2
vep_species 2
vep_cache_version 2
remove_ribo_rna 2
ribo_database_manifest 2
save_non_ribo_reads 2
bam_csi_index 2
skip_qualimap 2
fasta_index 2
skip_deduplication 2
skip_decoy_generation 2
fragment_size 2
chromap_index 2
keep_multi_map 2
bwa_min_score 2
bamtools_filter_pe_config 2
bamtools_filter_se_config 2
narrow_peak 2
broad_cutoff 2
macs_fdr 2
macs_pvalue 2
min_reps_consensus 2
save_macs_pileup 2
skip_consensus_peaks 2
skip_plot_fingerprint 2
fingerprint_bins 2
krakendb 2
bowtie_index 2
ncrna 2
There are over 2000 params that appear in only a single pipeline and I am not sure how many of those might be similarly named but not identical and should perhaps be standardised?
On top of that, I'd like a global meta.map fields
Create a global parameters list
Terminology between pipelines and shared assets can differ. To help preserve shared content and familiarity between pipelines, subworkflows and modules, it would be beneficial to create a reserved ontology. For example, parameter names such as
--bwa_index
and--bwa
should be reserved.A reserved ontology list needs to be created. There might be example from elsewhere we could use to start. We could also scrape all JSON schema files and build a big list (link to it in the writing pipelines tutorial). Final product could be a list with clear descriptions that can be used by developers to guide naming conventions.