mnsmar / clipseqtools

A suite for the analysis of CLIP-Seq datasets.
http://mourelatos.med.upenn.edu/clipseqtools/
12 stars 3 forks source link

Custom STAR parameters with --config? #11

Open jeffsun905 opened 2 years ago

jeffsun905 commented 2 years ago

Hi,

Thank you for the nice tool. I am using it for our data but it failed at the alignment step (pre-process all). From the star log, I saw following: "EXITING because of fatal error: buffer size for SJ output is too small Solution: increase input parameter --limitOutSJcollapsed Jul 01 09:22:45 ...... FATAL ERROR, exiting"

Is there a way to provide different parameters to STAR? The help manual has "--config Path to command config file". Is it the place to provide that? Do you have any example file for the configure file (to modify STAR or other parameters)?

Thanks!

mnsmar commented 2 years ago

Hi, currently it is not possible to provide custom options for STAR. I would suggest to do the alignments outside clipseqtools and then use the preprocessing modules individually instead of bundled with the all command.

The STAR options that CLIPSeqTools uses are:

STAR
    --genomeDir [genome] \
    --readFilesIn [fastq] \
    --runThreadN [threads] \
    --outSAMattributes All \
    --outFilterMultimapScoreRange 0 \
    --alignIntronMax 50000 \
    --outFilterMatchNmin 15 \
    --outFilterMatchNminOverLread 0.9 \
    --readFilesCommand zcat \
    --outFileNamePrefix [o_prefix].star_

After you do the alignment with STAR, you can run:

clipseqtools-preprocess cleanup_alignment --sam [SAM_FILE_FROM_STAR] --o_prefix [PATH] -v

clipseqtools-preprocess sam_to_sqlite --sam_file [CLEAN_SAM] --database [NEW_DB_FILE] --drop -v

clipseqtools-preprocess annotate_with_genic_elements --database [DB_FILE] --gtf [GTF_FILE] --drop -v

clipseqtools-preprocess annotate_with_file --database [DB_FILE] --a_file [RMSK_FILE] --column rmsk --both_strands -v

clipseqtools-preprocess annotate_with_deletions --database [DB_FILE] --drop -v

clipseqtools-preprocess annotate_with_conservation --database [DB_FILE] --cons_dir [PATH_TO_CONSERVATION_FILES] --rname_sizes [FILE_WITH_CHROMOSOME_SIZES] --drop -v
jeffsun905 commented 2 years ago

Thank you for the quick response! Done with alignment. However, in the following step, I failed to run: clipseqtools-preprocess sam_to_sqlite --sam_file [CLEAN_SAM] --database [NEW_DB_FILE] --drop -v I assume the [CLEAN_SAM] is the final output from "cleanup_alignment" (which is ..sorted.collapsed.sam) and the [NEW_DB_FILE] is the provided file name with ".db" extension. The command also requires to provide "--table" option. I tried different table names but it did not work. Thank you!

mnsmar commented 2 years ago

Was there an error message?

--table is optional as the default is 'sample'. However, you can choose any name you like.

jeffsun905 commented 2 years ago

Apparently, it is a required parameter for the package i got (1.0.0). If i don't provide "--table", I got this: clipseqtools-preprocess sam_to_sqlite --sam_file reads.adtrim.star_Aligned.out.single.sorted.collapsed.sam --database reads.adtrim.star_Aligned.out.single.sorted.collapsed.db --drop -v Required option 'table' missing If I provide a name without containing a ".", now it is working. I thought the table name should match the db name, which has ".". Thanks!

mnsmar commented 2 years ago

I see. Well apparently clipseqtools-preprocess sam_to_sqlite is the only tool that does not use the default value "sample". I'll need to fix that. Please remember to use the --table option with whatever name you chose at this step for the remaining tools.

jeffsun905 commented 2 years ago

It is better to use "--table sample"; otherwise it will fail the next step as it looks for the default "sample" table.

jeffsun905 commented 2 years ago

In analysis modules, I found "nmer_enrichment_over_shuffled" is super slow. For typical samples, it takes 5-7 days. Is it normal? is there speed-up strategy or this step can be skipped without affecting comparative analysis, i.e., the clipseqtools-compare?

mnsmar commented 2 years ago

Yes unfortunately it is slow. It can be skipped without affecting comparative analysis.

jeffsun905 commented 2 years ago

Thank you for the confirmation.