hkuwahara / sleaping

7 stars 1 forks source link

core dumped when processing big fastq file #1

Closed yanlifeng closed 1 year ago

yanlifeng commented 1 year ago

Hi! I am using fadso to downsample FASTQ data(SRR7890824, about 200G), but I am encountering the following error: ./fadso single -r 12345 -k 10000000 -i /home/bigssd/ylf_data/SRR7890824_1.fastq -m sleaping -o p.fq fadso downsampler realloc(): invalid next size Aborted (core dumped)

PS : I tested a few other small data (about 10G) that were OK, but not this large data.

If possible, please give me a hand. Thanks a lot!

hkuwahara commented 1 year ago

Thank you for reporting this issue. I will use the same fastq data to see if I can recreate the error or if the problem is random.

I just ran fadso under two settings on SRR7890824 as follows:

./fadso single -r 12345 -k 10000000 -i SRR7890824.sra_1.fastq -m sleaping -o p2.fastq ./fadso single -r 12345 -k 10000000 -i SRR7890824.sra_1.fastq.gz -m sleaping -o p2.fastq

And in both cases, I got the output within few minutes without running into an error. So, I don't think the error you got had anything to do with the specific nature of SRR7890824 (including its size). Could you tell me what computational env you used to run fadso?

On Wed, Aug 9, 2023 at 10:12 AM ylf9811 @.***> wrote:

Hi! I am using fadso to downsample FASTQ data(SRR7890824 https://www.ncbi.nlm.nih.gov/sra/?term=SRR7890824, about 200G), but I am encountering the following error: ./fadso single -r 12345 -k 10000000 -i /home/bigssd/ylf_data/SRR7890824_1.fastq -m sleaping -o p.fq fadso downsampler realloc(): invalid next size Aborted (core dumped)

PS : I tested a few other small data (about 10G) that were OK, but not this large data.

If possible, please give me a hand. Thanks a lot!

— Reply to this email directly, view it on GitHub https://github.com/hkuwahara/sleaping/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF2TNU24HBR6CRY62OZII5LXUMZ7NANCNFSM6AAAAAA3JRMIYA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yanlifeng commented 1 year ago

Thanks for the quick reply!

Here's my runtime environment: OS : Ubuntu 20.04 CPU : Intel(R) Xeon(R) Platinum 8260 256GB RAM 1T SSD 32T HDD

I also find realloc issues on this small data as well: ./fadso single -r 12345 -k 10000 -i test_data/SRR15616876_1.fastq -m sleaping -o p.fq fadso downsampler realloc(): invalid next size Aborted (core dumped)

Also, I'm not sure if there's something wrong with the way I'm dumping the data, here's the process I'm using to get that data: wget https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR15616876/SRR15616876 mv SRR15616876 SRR15616876.sra fasterq-dump -e 20 SRR15616876.sra

The md5 value of _SRR156168761.fastq is c84604da9c9312136a713ec8d4b2a380, and I also put this data on the Google Cloud Drive as well.

hkuwahara commented 1 year ago

Thank you for showing another test case. I believe I fixed this problem. Could you please download the latest version and test it on the datasets that you could not downsample?

yanlifeng commented 1 year ago

Hi! I have downloaded the latest version of the software and conducted the testing. The issue has been successfully resolved. Your assistance was invaluable. Thank you!