Create a global parameters list

christopher-hakkaart commented 2 years ago

Create a global parameters list

Terminology between pipelines and shared assets can differ. To help preserve shared content and familiarity between pipelines, subworkflows and modules, it would be beneficial to create a reserved ontology. For example, parameter names such as --bwa_index and --bwa should be reserved.

A reserved ontology list needs to be created. There might be example from elsewhere we could use to start. We could also scrape all JSON schema files and build a big list (link to it in the writing pipelines tutorial). Final product could be a list with clear descriptions that can be used by developers to guide naming conventions.

christopher-hakkaart commented 2 years ago

A secondary objective will be to reserve global samplesheet headers

awgymer commented 1 year ago

I did a really quick and dirty scrape of the schema.json from the pipelines listed as released on the website. 3 of those did not appear to have a schema.json in master and got skipped: mnaseseq, imcyto, slamseq. This leaves 44 pipelines.

Here are all the params which appeared in more than one pipeline:

outdir  44
email   44
custom_config_version   44
custom_config_base  44
config_profile_description  44
config_profile_contact  44
config_profile_url  44
max_cpus    44
max_memory  44
max_time    44
help    44
email_on_fail   44
plaintext_email 44
monochrome_logs 44
tracedir    43
input   42
publish_dir_mode    42
max_multiqc_email_size  40
validate_params 40
show_hidden_params  40
config_profile_name 39
multiqc_config  38
multiqc_title   33
hook_url    29
igenomes_ignore 28
igenomes_base   27
genome  25
version 24
multiqc_logo    24
multiqc_methods_description 24
fasta   23
skip_multiqc    17
save_reference  15
aligner 14
gtf 13
enable_conda    13
hostnames   12
skip_fastqc 12
clip_r1 10
three_prime_clip_r1 10
clip_r2 9
three_prime_clip_r2 9
save_trimmed    9
gff 8
skip_trimming   8
trim_nextseq    7
star_index  7
seq_center  7
skip_qc 7
protocol    6
save_unaligned  6
save_align_intermeds    6
gene_bed    6
bwa_index   6
skip_preseq 5
single_end  5
enzyme  4
trim_fastq  4
star_ignore_sjdbgtf 4
read_length 4
skip_alignment  4
save_merged_fastq   4
blacklist   4
skip_igv    4
skip_peak_qc    4
macs_gsize  4
name    4
database    3
decoy_method    3
precursor_mass_tolerance    3
fragment_mass_tolerance 3
fixed_mods  3
variable_mods   3
min_peptide_length  3
max_peptide_length  3
num_hits    3
subset_max_train    3
klammer 3
description_correct_features    3
quantification_method   3
contrasts   3
singularity_pull_docker_container   3
bowtie2_index   3
save_trimmed_fail   3
skip_cutadapt   3
skip_markduplicates 3
skip_picard_metrics 3
seq_platform    3
keep_dups   3
deseq2_vst  3
skip_deseq2_qc  3
skip_peak_annotation    3
skip_plot_profile   3
root_folder 2
local_input_type    2
add_decoys  2
openms_peakpicking  2
peakpicking_inmemory    2
peakpicking_ms_levels   2
search_engines  2
num_enzyme_termini  2
allowed_missed_cleavages    2
precursor_mass_tolerance_unit   2
fragment_mass_tolerance_unit    2
fragment_method 2
isotope_error_range 2
instrument  2
min_precursor_charge    2
max_precursor_charge    2
max_mods    2
db_debug    2
enable_mod_localization 2
mod_localization    2
luciphor_neutral_losses 2
luciphor_decoy_mass 2
luciphor_decoy_neutral_losses   2
luciphor_debug  2
IL_equivalent   2
posterior_probabilities 2
pp_debug    2
FDR_level   2
train_FDR   2
test_FDR    2
outlier_handling    2
consensusid_algorithm   2
consensusid_considered_top_hits 2
min_consensus_support   2
protein_level_fdr_cutoff    2
protein_quant   2
mass_recalibration  2
transfer_ids    2
targeted_only   2
skip_post_msstats   2
ref_condition   2
enable_qc   2
ptxqc_report_layout 2
skip_pycoqc 2
skip_nanoplot   2
kraken2_db  2
skip_kraken2    2
skip_fastp  2
variant_caller  2
min_mapped_reads    2
mode    2
adapter_fasta   2
save_databases  2
transcript_fasta    2
salmon_index    2
tools   2
trim    2
fai 2
malt_mode   2
stranded    2
skip_quantification 2
skip_bigwig 2
peakcaller  2
annotation_tool 2
with_umi    2
umitools_dedup_stats    2
dragmap 2
skip_tools  2
split_fastq 2
no_intervals    2
snpeff_cache    2
vep_cache   2
dbsnp   2
dbsnp_tbi   2
dict    2
fasta_fai   2
known_indels    2
known_indels_tbi    2
mappability 2
snpeff_db   2
vep_genome  2
vep_species 2
vep_cache_version   2
remove_ribo_rna 2
ribo_database_manifest  2
save_non_ribo_reads 2
bam_csi_index   2
skip_qualimap   2
fasta_index 2
skip_deduplication  2
skip_decoy_generation   2
fragment_size   2
chromap_index   2
keep_multi_map  2
bwa_min_score   2
bamtools_filter_pe_config   2
bamtools_filter_se_config   2
narrow_peak 2
broad_cutoff    2
macs_fdr    2
macs_pvalue 2
min_reps_consensus  2
save_macs_pileup    2
skip_consensus_peaks    2
skip_plot_fingerprint   2
fingerprint_bins    2
krakendb    2
bowtie_index    2
ncrna   2

There are over 2000 params that appear in only a single pipeline and I am not sure how many of those might be similarly named but not identical and should perhaps be standardised?

maxulysse commented 11 months ago

On top of that, I'd like a global meta.map fields

nf-core / website

Create a global parameters list #1251