Open jeffsun905 opened 2 years ago
Hi, currently it is not possible to provide custom options for STAR. I would suggest to do the alignments outside clipseqtools and then use the preprocessing modules individually instead of bundled with the all
command.
The STAR options that CLIPSeqTools uses are:
STAR
--genomeDir [genome] \
--readFilesIn [fastq] \
--runThreadN [threads] \
--outSAMattributes All \
--outFilterMultimapScoreRange 0 \
--alignIntronMax 50000 \
--outFilterMatchNmin 15 \
--outFilterMatchNminOverLread 0.9 \
--readFilesCommand zcat \
--outFileNamePrefix [o_prefix].star_
After you do the alignment with STAR, you can run:
clipseqtools-preprocess cleanup_alignment --sam [SAM_FILE_FROM_STAR] --o_prefix [PATH] -v
clipseqtools-preprocess sam_to_sqlite --sam_file [CLEAN_SAM] --database [NEW_DB_FILE] --drop -v
clipseqtools-preprocess annotate_with_genic_elements --database [DB_FILE] --gtf [GTF_FILE] --drop -v
clipseqtools-preprocess annotate_with_file --database [DB_FILE] --a_file [RMSK_FILE] --column rmsk --both_strands -v
clipseqtools-preprocess annotate_with_deletions --database [DB_FILE] --drop -v
clipseqtools-preprocess annotate_with_conservation --database [DB_FILE] --cons_dir [PATH_TO_CONSERVATION_FILES] --rname_sizes [FILE_WITH_CHROMOSOME_SIZES] --drop -v
Thank you for the quick response! Done with alignment. However, in the following step, I failed to run: clipseqtools-preprocess sam_to_sqlite --sam_file [CLEAN_SAM] --database [NEW_DB_FILE] --drop -v I assume the [CLEAN_SAM] is the final output from "cleanup_alignment" (which is ..sorted.collapsed.sam) and the [NEW_DB_FILE] is the provided file name with ".db" extension. The command also requires to provide "--table" option. I tried different table names but it did not work. Thank you!
Was there an error message?
--table
is optional as the default is 'sample'. However, you can choose any name you like.
Apparently, it is a required parameter for the package i got (1.0.0). If i don't provide "--table", I got this: clipseqtools-preprocess sam_to_sqlite --sam_file reads.adtrim.star_Aligned.out.single.sorted.collapsed.sam --database reads.adtrim.star_Aligned.out.single.sorted.collapsed.db --drop -v Required option 'table' missing If I provide a name without containing a ".", now it is working. I thought the table name should match the db name, which has ".". Thanks!
I see. Well apparently clipseqtools-preprocess sam_to_sqlite
is the only tool that does not use the default value "sample". I'll need to fix that. Please remember to use the --table
option with whatever name you chose at this step for the remaining tools.
It is better to use "--table sample"; otherwise it will fail the next step as it looks for the default "sample" table.
In analysis modules, I found "nmer_enrichment_over_shuffled" is super slow. For typical samples, it takes 5-7 days. Is it normal? is there speed-up strategy or this step can be skipped without affecting comparative analysis, i.e., the clipseqtools-compare?
Yes unfortunately it is slow. It can be skipped without affecting comparative analysis.
Thank you for the confirmation.
Hi,
Thank you for the nice tool. I am using it for our data but it failed at the alignment step (pre-process all). From the star log, I saw following: "EXITING because of fatal error: buffer size for SJ output is too small Solution: increase input parameter --limitOutSJcollapsed Jul 01 09:22:45 ...... FATAL ERROR, exiting"
Is there a way to provide different parameters to STAR? The help manual has "--config Path to command config file". Is it the place to provide that? Do you have any example file for the configure file (to modify STAR or other parameters)?
Thanks!