t-neumann / slamdunk

Streamlining SLAM-seq analysis with ultra-high sensitivity
GNU Affero General Public License v3.0
38 stars 23 forks source link

Use slamdunk -all get the filter dicrectory "_mapped_filtered.bam " file just 8kb? #118

Closed lillianj97 closed 6 months ago

lillianj97 commented 2 years ago

This is my command: slamdunk all -r /home/ubuntu/data/hg38.fa -b /home/ubuntu/data/slamseq/hg38_3UTR -o /media/volume/slam_result -t 8 /home/ubuntu/data/trimfile/DMSO_8h_trim_rep1.fq.gz

then I cd the filter directory, I get "DMSO_8h_trim_rep1.fq_slamdunk_mapped_filtered.bam" file just 8kb. I don't know why. Could you tell me what problem I meδ»–οΌŸ How can i solve it?

lillianj97 commented 2 years ago

slamdunk version is 0.4.3. I use conda create a env and install this software.

t-neumann commented 2 years ago

Hi - are there any reads in the bam file or none at all? What does samtools flagstat DMSO_8h_trim_rep1.fq_slamdunk_mapped_filtered.bam tell you?

lillianj97 commented 2 years ago

Hi - are there any reads in the bam file or none at all? What does samtools flagstat DMSO_8h_trim_rep1.fq_slamdunk_mapped_filtered.bam tell you?

this is the reply:

3656564 + 0 in total (QC-passed reads + QC-failed reads) 3656564 + 0 primary 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 3656564 + 0 mapped (100.00% : N/A) 3656564 + 0 primary mapped (100.00% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

t-neumann commented 2 years ago

And how many reads are in the original /home/ubuntu/data/trimfile/DMSO_8h_trim_rep1.fq.gz file?

lillianj97 commented 2 years ago

And how many reads are in the original /home/ubuntu/data/trimfile/DMSO_8h_trim_rep1.fq.gz file?

I use zcat to check . 8336090

t-neumann commented 2 years ago

That sounds low but could be reasonable. We usually have 30-50% counted reads.

lillianj97 commented 2 years ago

That sounds low but could be reasonable. We usually have 30-50% counted reads.

So is it because too many filters are filtered out that it fails? Is there any way to figure out it?

lillianj97 commented 2 years ago

Sorry,I think i ask wrong question, when I use "alleyoop read-separator" to deal wiith the DMSO_8h_trim_rep1.fq_slamdunk_mapped_filtered.bam file ,I got the "DMSO_8h_trim_rep1.fq_slamdunk_mapped_filtered_TCReads.bam" file just 8kb. It is strange. alleyoop read-separator -o /media/volume/slam_result/separate_lable -r /media/volume/STAR_index/Homo_sapiens.GRCh38.dna.primary_assembly.fa -t 8 /media/volume/slam_result/filter/DMSO_8h_trim_rep3_slamdunk_mapped_filtered.bam

lillianj97 commented 2 years ago

more detail: DMSO_8h_trim_rep1.fq_slamdunk_mapped_filtered_TCReads.bam.bai 4kb DMSO_8h_trim_rep1.fq_slamdunk_mapped_filtered_TCReads.bam 8kb DMSO_8h_trim_rep1.fq_slamdunk_mapped_filtered_read_separator.log 0kb DMSO_8h_trim_rep1.fq_slamdunk_mapped_filtered_backgroundReads.bam.bai 3213kb DMSO_8h_trim_rep1.fq_slamdunk_mapped_filtered_backgroundReads.bam 134625kb

the bam file is too small, it will affects me to exectue the "bamCoverage " to convert bam file to bw file.

t-neumann commented 2 years ago

That looks indeed quite low - can you maybe run multiQC on it to check the conversion rates?

lillianj97 commented 2 years ago

That looks indeed quite low - can you maybe run multiQC on it to check the conversion rates?

sorry, I am not clear to check which file? the filtered_TCReads.bam file or fq flie after trim or bigwig file?

t-neumann commented 2 years ago

Ah sorry - just on the plain filtered bam file and best try alleyoop rates as documented here.


Then you can use a simply multiqc in the same folder to summarise the results

lillianj97 commented 2 years ago

slamdunk rates v0.4.3

" A a C c G g T t N n " "A 0 0 0 0 0 0 0 0 0 0 " "C 0 0 0 0 0 0 0 0 0 0 " "G 0 0 0 0 0 0 0 0 0 0 " "T 0 0 0 0 0 0 0 0 0 0 " "N 0 0 0 0 0 0 0 0 0 0 "

this is the command alleyoop rates -o /media/volume/slam_result/separate_lable -r /media/volume/STAR_index/Homo_sapiens.GRCh38.dna.primary_assembly.fa -t 8 DMSO_8h_rep1.fastq_slamdunk_mapped_filtered.bam

and the pdf no any Histogram.

this is the log file :/home/ubuntu/anaconda3/envs/slamdunk/lib/python3.10/site-packages/slamdunk/plot/compute_overall_rates.R -f /media/volume/slam_result/separate_lable/DMSO_8h_rep1.fastq_slamdunk_mapped_filtered_overallrates.csv -n DMSO_8h_rep1.fastq_slamdunk_mapped_filtered -O /media/volume/slam_result/separate_lable/8h_rep1.fastq_slamdunk_mapped_filtered_overallrates.pdf b'Warning messages:\n'b'1: Use of printTab$y is discouraged. Use y instead. \n'b'2: Removed 24 rows containing missing values (position_stack). \n'b'3: Removed 24 rows containing missing values (geom_point). \n'b'4: Removed 24 rows containing missing values (geom_text). \n'b'null device \n'b' 1 \n'Skipped computing overall rates for file DMSO_8h_rep1.fastq_slamdunk_mapped_filtered.bam Skipped computing overall rate pdfs for file DMSO_8h_rep1.fastq_slamdunk_mapped_filtered.bam

t-neumann commented 2 years ago

Ok ignore the missing PDF for now - MultiQC should still be running on those text files

lillianj97 commented 2 years ago

8 /home/ubuntu/data/trimfile/DMSO_8h_trim_rep1.fq.gz

Useing MultiQC to check this file "/home/ubuntu/data/trimfile/DMSO_8h_trim_rep1.fq.gz"? or "DMSO_8h_rep1.fastq_slamdunk_mapped_filtered.bam"?

t-neumann commented 2 years ago


move into the folder where you produced the output of alleyoop rates and simply run multiqc .

lillianj97 commented 2 years ago


t-neumann commented 2 years ago

Hm ok something went wrong there. How do your count files look - are they also all filled with 0s?

lillianj97 commented 2 years ago

Hm ok something went wrong there. How do your count files look - are they also all filled with 0s?

multiqc_slamdunk_readrates_minus.txt Sample A>C A>G A>T C>A C>G C>T G>A G>C G>T T>A T>C T>G DMSO_8h_rep1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

multiqc_slamdunk_readrates_plus.txt Sample A>C A>G A>T C>A C>G C>T G>A G>C G>T T>A T>C T>G DMSO_8h_rep1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

lillianj97 commented 2 years ago

It strange, the "DMSO_8h_rep1.fastq_slamdunk_mapped_filtered.bam" is not empty, it is 128457 kb. why the next step get 0?

t-neumann commented 2 years ago

Yeah thats why I would like you to look into the count subdirectory at 1-2 count files and see if they are all 0.

lillianj97 commented 2 years ago

Yeah thats why I would like you to look into the count subdirectory at 1-2 count files and see if they are all 0.

what is "count subdirectory " one?

Don't we have any way to figure out this problem?

t-neumann commented 2 years ago

Thats what I'm trying to do here....

t-neumann commented 2 years ago

The count directory is produced by slamdunk all and has the corresponding count files

lillianj97 commented 2 years ago

Ok, when I cd the directory of slamdunk all has created count,and the run this command :multiqc /media/volume/slam_result/count/count,I got the nothing this time.

/// MultiQC πŸ” | v1.13

| multiqc | Search path : /media/volume/slam_result/count/count | searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 4/4 | multiqc | No analysis results found. Cleaning up.. | multiqc | MultiQC complete

lillianj97 commented 2 years ago

Ok, when I cd the directory of slamdunk all has created count,and the run this command :multiqc /media/volume/slam_result/count/count,I got the nothing this time.

/// MultiQC πŸ” | v1.13

| multiqc | Search path : /media/volume/slam_result/count/count | searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 4/4 | multiqc | No analysis results found. Cleaning up.. | multiqc | MultiQC complete

The 4 files are DMSO_8h_rep1.fastq_slamdunk_mapped_filtered_tcount.log DMSO_8h_rep1.fastq_slamdunk_mapped_filtered_tcount.tsv DMSO_8h_rep1.fastq_slamdunk_mapped_filtered_tcount_mins.bedgraph DMSO_8h_rep1.fastq_slamdunk_mapped_filtered_tcount_plus.bedgraph

t-neumann commented 2 years ago

What I meant is to look in the readCount column of the *tcount.tsv files and check if there are reads counted or none at all. That would point to a problem in the bed file then, if there are reads reported in the bam file

lillianj97 commented 2 years ago

I have cheak the DMSO_8h_rep1.fastq_slamdunk_mapped_filtered_tcount.tsv ,it is not empty,

slamdunk v0.4.3 3 sample info: GSM4746107_22RV1_DMSO_8h_rep1.fastq 0 pulse 0

annotation: hg38_3UTR 3af4e2c457787d28967fd5cbbf887c11

Chromosome Start End Name Length Strand ConversionRate ReadsCPM Tcontent CoverageOnTs ConversionsOnTs ReadCount TcReadCount multimapCount ConversionRateLower ConversionRateUpper chr1 67092164 67093004 XM_011541469.2_utr3_0_0_chr1_67092165_r 840 - 0 0 286 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093004 XM_017001276.2_utr3_0_0_chr1_67092165_r 840 - 0 0 286 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093004 XM_011541467.2_utr3_0_0_chr1_67092165_r 840 - 0 0 286 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093579 NM_001276352.2_utr3_0_0_chr1_67092165_r 1415 - 0 0 455 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093004 NM_001276351.2_utr3_0_0_chr1_67092165_r 840 - 0 0 286 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093004 XM_011541465.3_utr3_0_0_chr1_67092165_r 840 - 0 0 286 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093004 XM_011541466.3_utr3_0_0_chr1_67092165_r 840 - 0 0 286 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093604 NR_075077.2_utr3_0_0_chr1_67092165_r 1440 - 0 0 457 0 0 0 0 0 -1.0 -1.0 chr1 67096251 67096321 NR_075077.2_utr3_1_0_chr1_67096252_r 70 - 0 0 18 0 0 0 0 0 -1.0 -1.0 chr1 67103237 67103382 NR_075077.2_utr3_2_0_chr1_67103238_r 145 - 0 0 44 0 0 0 0 0 -1.0 -1.0 chr1 67111576 67111644 NR_075077.2_utr3_3_0_chr1_67111577_r 68 - 0 0 18 0 0 0 0 0 -1.0 -1.0 chr1 67113613 67113756 NR_075077.2_utr3_4_0_chr1_67113614_r 143 - 0 0 43 0 0 0 0 0 -1.0 -1.0 chr1 67115351 67115464 NR_075077.2_utr3_5_0_chr1_67115352_r 113 - 0 0 31 0 0 0 0 0 -1.0 -1.0 chr1 67125751 67125909 NR_075077.2_utr3_6_0_chr1_67125752_r 158 - 0 0 37 0 0 0 0 0 -1.0 -1.0 chr1 67127165 67127257 NR_075077.2_utr3_7_0_chr1_67127166_r 92 - 0 0 20 0 0 0 0 0 -1.0 -1.0

lillianj97 commented 2 years ago

I have cheak the DMSO_8h_rep1.fastq_slamdunk_mapped_filtered_tcount.tsv ,it is not empty,

slamdunk v0.4.3 3 sample info: GSM4746107_22RV1_DMSO_8h_rep1.fastq 0 pulse 0 #annotation: hg38_3UTR 3af4e2c457787d28967fd5cbbf887c11 Chromosome Start End Name Length Strand ConversionRate ReadsCPM Tcontent CoverageOnTs ConversionsOnTs ReadCount TcReadCount multimapCount ConversionRateLower ConversionRateUpper chr1 67092164 67093004 XM_011541469.2_utr3_0_0_chr1_67092165_r 840 - 0 0 286 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093004 XM_017001276.2_utr3_0_0_chr1_67092165_r 840 - 0 0 286 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093004 XM_011541467.2_utr3_0_0_chr1_67092165_r 840 - 0 0 286 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093579 NM_001276352.2_utr3_0_0_chr1_67092165_r 1415 - 0 0 455 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093004 NM_001276351.2_utr3_0_0_chr1_67092165_r 840 - 0 0 286 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093004 XM_011541465.3_utr3_0_0_chr1_67092165_r 840 - 0 0 286 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093004 XM_011541466.3_utr3_0_0_chr1_67092165_r 840 - 0 0 286 0 0 0 0 0 -1.0 -1.0 chr1 67092164 67093604 NR_075077.2_utr3_0_0_chr1_67092165_r 1440 - 0 0 457 0 0 0 0 0 -1.0 -1.0 chr1 67096251 67096321 NR_075077.2_utr3_1_0_chr1_67096252_r 70 - 0 0 18 0 0 0 0 0 -1.0 -1.0 chr1 67103237 67103382 NR_075077.2_utr3_2_0_chr1_67103238_r 145 - 0 0 44 0 0 0 0 0 -1.0 -1.0 chr1 67111576 67111644 NR_075077.2_utr3_3_0_chr1_67111577_r 68 - 0 0 18 0 0 0 0 0 -1.0 -1.0 chr1 67113613 67113756 NR_075077.2_utr3_4_0_chr1_67113614_r 143 - 0 0 43 0 0 0 0 0 -1.0 -1.0 chr1 67115351 67115464 NR_075077.2_utr3_5_0_chr1_67115352_r 113 - 0 0 31 0 0 0 0 0 -1.0 -1.0 chr1 67125751 67125909 NR_075077.2_utr3_6_0_chr1_67125752_r 158 - 0 0 37 0 0 0 0 0 -1.0 -1.0 chr1 67127165 67127257 NR_075077.2_utr3_7_0_chr1_67127166_r 92 - 0 0 20 0 0 0 0 0 -1.0 -1.0

the file "slamdunk_mapped_filtered_tcount.tsv" is 30377kb and it is not empty.

lillianj97 commented 2 years ago

I download hg38 3UTR file from UCSC

t-neumann commented 2 years ago

Hm the file looks quite empty to me - the ReadCount column shows all 0. Is this the case for all genes?