mnsmar / clipseqtools

A suite for the analysis of CLIP-Seq datasets.
http://mourelatos.med.upenn.edu/clipseqtools/
12 stars 3 forks source link

CLIPseqtools-compare issues and analyzing replicates #7

Closed bsaleme closed 3 years ago

bsaleme commented 4 years ago

Hello, I was wondering if there is a way in ClipseqTools to analyze biological/technical replicates simultaneously? Or do we have to merge all fastq files prior to analysis?

More importantly, I have been having issues with clipseqtools-compare. It crashes minutes after I start the run and I couldn't get it to work at all. You will see below the errors that I get:

Starting analysis: libraries_overlap_stats Validating arguments Reading sizes for reference alignment sequences Creating reads collection Creating reference reads collection Measuring the overlap of the primary library with the reference Annotating 1 with reference records slice: slice starts out of bounds in pos 0 (start is 15014; source dim 0 runs 0 to -1) at /usr/local/lib64/perl5/PDL/Core.pm line 814. PDL::ANON(PDL=SCALAR(0x55873fccc378), 1, undef) called at /usr/local/share/perl5/CLIPSeqTools/CompareApp/libraries_overlap_stats.pm line 151 CLIPSeqTools::CompareApp::libraries_overlap_stats::ANON(GenOO::Data::DB::DBIC::Species::Schema::Result::sample=HASH(0x55873fed76b0)) called at /usr/local/share/perl5/GenOO/RegionCollection/Type/DBIC.pm line 182 GenOO::RegionCollection::Type::DBIC::foreach_record_on_rname_do(GenOO::RegionCollection::Type::DBIC=HASH(0x55873f856cc8), 1, CODE(0x55873f607130)) called at /usr/local/share/perl5/CLIPSeqTools/CompareApp/libraries_overlap_stats.pm line 155 CLIPSeqTools::CompareApp::libraries_overlap_stats::run(CLIPSeqTools::CompareApp::libraries_overlap_stats=HASH(0x55873dd19d00)) called at /usr/local/share/perl5/CLIPSeqTools/CompareApp/all.pm line 164 CLIPSeqTools::CompareApp::all::run(CLIPSeqTools::CompareApp::all=HASH(0x55873de22e58)) called at /usr/local/bin/clipseqtools-compare line 19

Any advice? Thanks!

mnsmar commented 4 years ago

Hi, CLIPSeqTools does not treat biological replicates in a special way. I usually prefer to run all replicates independently and then compare the final results for reproducibility. Merging the fastq is an option, but then you lose information about reproducibility. For comparing gene, transcript, intron, exon counts across libraries (differential expression) you can use clipseqtools-compare join_tables on the output of clipseqtools count_reads_on_genic_elements to prepare the files for DESeq2 which takes into account replicates.

Regarding the error: I'm not sure what is going on. Do you use custom annotation files? I see that the script complains for the size of chromosome "1" which might indicate that maybe this is not contained in the rname_sizes files that you give?

bsaleme commented 4 years ago

Thank you very much for your response!

As for the error, I was using the annotation files linked on the clipseqtools website (http://mourelatos.med.upenn.edu/clipseqtools/data/), the community downloads hg38. Do you think I should try the hg19 instead?

mnsmar commented 4 years ago

The chromosome name seems to be "1" without the prefix "chr" while the rname_sizes file has "chr1". I do not what is the reason for this discrepancy. Did you re-create the STAR index using a different genome file (e.g. from ENSEMBL). The downloaded hg38 files use the "chr" prefix in all files.

mnsmar commented 3 years ago

Closed due to inactivity.