hasindu2008 / f5c

Ultra-fast methylation calling and event alignment tool for nanopore sequencing data (supports CUDA acceleration)
https://hasindu2008.github.io/f5c/docs/overview
MIT License
144 stars 26 forks source link

Incorrect header with correct header #96

Closed Coracollar closed 2 years ago

Coracollar commented 3 years ago

Hi, I'm trying to call methylation from a file obtained by tldr call meth that looks like: chromosome strand start end read_name log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands num_motifs sequence 9433d43d-aae9-4757-a9a8-7e980999e835 - 45 45 03de98f9-9155-475e-b80c-5952fa449927 3.32 -79.74 -83.06 1 1 AAATACGCCAG 9433d43d-aae9-4757-a9a8-7e980999e835 + 45 45 0563bcf3-41de-4f3a-bed3-bacf6990de67 4.65 -133.14 -137.79 1 1 AAATACGCCAG 9433d43d-aae9-4757-a9a8-7e980999e835 + 45 45 083364bd-748d-4ba4-8c7a-07918d67a595 5.85 -100.39 -106.24 1 1 AAATACGCCAG 9433d43d-aae9-4757-a9a8-7e980999e835 - 45 45 53a2038a-376f-4571-8ef3-12fcf593f773 -1.75 -126.39 -124.63 1 1 AAATACGCCAG 9433d43d-aae9-4757-a9a8-7e980999e835 - 45 45 6948bfe3-058d-4eaf-b3a2-2284eaabf892 6.49 -104.35 -110.84 1 1 AAATACGCCAG 9433d43d-aae9-4757-a9a8-7e980999e835 - 45 45 7203447c-fb52-4b39-b262-943bd2845e83 7.44 -139.72 -147.16 1 1 AAATACGCCAG 9433d43d-aae9-4757-a9a8-7e980999e835 + 45 45 81c21fe2-04e6-4802-9bc4-e93d19432f00 7.21 -108.18 -115.40 1 1 AAATACGCCAG 9433d43d-aae9-4757-a9a8-7e980999e835 - 45 45 94c79f9d-c701-41a2-be8a-04309b46a44b -1.02 -154.81 -153.79 1 1 AAATACGCCAG 9433d43d-aae9-4757-a9a8-7e980999e835 - 45 45 96124793-5040-47a4-a5e6-3eeebb880496 7.31 -115.83 -123.14 1 1 AAATACGCCAG 9433d43d-aae9-4757-a9a8-7e980999e835 + 45 45 af21be23-6ea9-4b93-8db5-532e9ed885fc 5.07 -168.89 -173.95 1 1 AAATACGCCAG 9433d43d-aae9-4757-a9a8-7e980999e835 - 45 45 bd9846ed-9973-488a-b346-2e5ac1cc52c3 2.98 -86.21 -89.19 1 1 AAATACGCCAG and have this error

f5c meth-freq -i C0_tldr.te.meth.tsv Incorrect header: chromosome strand start end read_name log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands num_motifs sequence

When using calculate_methylation_frequency.py I get an error in num_motifs, when again num_motifs does exist.

calculate_methylation_frequency.py C0_tldr.te.meth.tsv Traceback (most recent call last): File "/usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/nanopolish/0.13.2-python-3.7.4/scripts/calculate_methylation_frequency.py", line 41, in num_sites = int(record['num_motifs']) KeyError: 'num_motifs'

awk '{print $10}' C0_tldr.te.meth.tsv num_motifs 1 1 1 1

Any idea what might be happening?

hasindu2008 commented 3 years ago

Hi

Could you do a: head -100 C0_tldr.te.meth.tsv > C0_tldr_head.te.meth.tsv and attach the generated C0_tldr_head.te.meth.tsv file so I could have a look? Also, could you tell me the command and parameters you used to generate the C0_tldr.te.meth.tsv?

Coracollar commented 3 years ago

Hi, For calling methilation I used:

module load nanopolish/0.13.2-python-3.7.4 module load tabix/0.2.6 export HDF5_PLUGIN_PATH=/home/coracollar/lib/plugins

while read p; do ./callmeth.sh basecalls.fastq.gz \ tldrdirectory "$p" \ C0 done <UUID_allsamples.txt

Gettig several files (using f5c on this individual files also does not work).

Then I put together all individual calls into one with header: echo "chromosome strand start end read_name log_lik_ratio log_lik_methylated log_lik_unmethylated num_calling_strands num_motifs sequence" > headerfile while read p; do zcat C0_mappings_minimap.sorted_C1_mappings_minimap.sorted_C2.sorted_C3_minimap2.sorted_CORT12_mappings_minimap.sorted_CORT13_mappings_minimap.sorted_CORT14_mappings_minimap.sorted_CORT15_mappings_minimap.sorted/C1_mappings_minimap.sorted.$p.te.meth.tsv.gz | tail -n +2; done tmpfile cat headerfile tmpfile > C0_tldr.te.meth.tsv.gz rm tmpfile

Here I attach the head of my file.

Thanks, Cora

On 30 Nov 2021, at 2:43 pm, Hasindu Gamaarachchi @.***> wrote:

Hi

Could you do a: head -100 C0_tldr.te.meth.tsv > C0_tldr_head.te.meth.tsv and attach the generated C0_tldr_head.te.meth.tsv file so I could have a look? Also, could you tell me the command and parameters you used to generate the C0_tldr.te.meth.tsv?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hasindu2008/f5c/issues/96#issuecomment-982648577, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZ7PMFEURTN4OZZ4GCP4VLUOTIHDANCNFSM5JB2RNJA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

hasindu2008 commented 3 years ago

Hi Can I know what are the commands and parameters inside your callmeth.sh ?

Could you please replace your header generating a command to :

echo -n -e "chromosome\tstrand\tstart\tend\tread_name\tlog_lik_ratio\tlog_lik_methylated\tlog_lik_unmethylated\tnum_calling_strands\tnum_motifs\tsequence\n" > headerfile

and see if it solves the problem?

Coracollar commented 3 years ago

https://github.com/adamewing/tldr/blob/master/scripts/tldr_callmeth.sh https://github.com/adamewing/tldr/blob/master/scripts/tldr_callmeth.sh

It does work!!! Thanks

On 30 Nov 2021, at 3:05 pm, Hasindu Gamaarachchi @.***> wrote:

Hi Can I know what are the commands and parameters inside your callmeth.sh ?

Could you please replace your header generating a command to :

echo -n -e "chromosome\tstrand\tstart\tend\tread_name\tlog_lik_ratio\tlog_lik_methylated\tlog_lik_unmethylated\tnum_calling_strands\tnum_motifs\tsequence\n" > headerfile and see if it solves the problem?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hasindu2008/f5c/issues/96#issuecomment-982666890, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZ7PMC2GV6OISIOWTBL54DUOTKZLANCNFSM5JB2RNJA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

hasindu2008 commented 3 years ago

By the way, you can directly use f5c to index and call methylation. Then it will be much faster.

f5c index -t 32 -p32  basecalls.fastq.gz -d tldrdirectory 

-t is for threads and -p is for I/O processes. You can change 32 to the number of cores on your CPU.

Then inside your loop, something like:

READS=basecalls.fastq.gz 
CALLDIR=tldrdirectory
SAMPLE=C0
while read p; do
UUID=$p
CONS=${CALLDIR}/${UUID}.cons.ref.fa
BAM=${CALLDIR}/${SAMPLE}.${UUID}.te.bam
TSV=${CALLDIR}/${SAMPLE}.${UUID}.te.meth.tsv
f5c call-methylation -t32 -p32 -K4096 -B100M -r $READS -g $CONS -b $BAM > $TSV
done <UUID_allsamples.txt

-t and -p are threads and I/O processes. -K and -B are batch sizes that you can tune based on available RAM for multithreaded efficiency.