marcpaga / esox

MIT License
9 stars 0 forks source link

Error: need at least one array to concatenate #4

Closed panp-to closed 4 months ago

panp-to commented 4 months ago

Hi Marc,

I met an error when input my data (In the previous question, someone had a similar problem). Do you have any idea why this error can arise? I guess it might have something to do with my data, becuase the data in github can successfully run. First, I entered the raw fast5 data (obtained directly after nanopore sequencing, but I'm not sure if it is multifast5 files), it will cause error. Then, I used ont-fast5-api to use multi-to single instruction, put singlefast5 to the esox, and it will cause error. (The basecalling were all from Guppy). image Traceback (most recent call last):
File "/scripts/basecall.py", line 255, in array_keeper[k] = np.concatenate(v, axis=0) File "<__array_function__ internals>", line 6, in concatenate ValueError: need at least one array to concatenate

Thanks.

marcpaga commented 4 months ago

Hi @panp-to,

Would it be possible to share the file that is causing the problem? It's a bit difficult to debug otherwise. Also, is it R9 or R10 data, our model only works with R9.

panp-to commented 4 months ago

Yes, I'd like to share it. Could I send it to your email?Is the author email available? (The one which mailbox suffix@umcutrecht.nl) Our data was from R9.4.1 sequencing chip.

marcpaga commented 4 months ago

You can send it to me, m.pagesgallego + the suffix you wrote.

panp-to commented 4 months ago

Ok, the email has been sent. Thanks.

marcpaga commented 4 months ago

Hi @panp-to,

I have tried to run your data and I get the same error. The problem is that the read ids in the fast5 files do not match the read ids in the fastq files.

For example, for the files named PAO93366_pass_barcode32_d092a066_f6e67411_1. The following read ids are in the fast5 file:

00365f82-9887-4900-873e-fbc9a56730e9
0062be75-d6db-4d6e-9c11-56578d2f07cc
00886cc6-9c33-4e00-a723-7b102dc0fcfb
008b523a-2c91-4ed1-bfec-ae1a89700ab5
009a2909-88df-4ecc-a669-b2f79aa8f8e1

But these read ids cannot be found in the fastq file.

panp-to commented 4 months ago

Yes, after inputting my data into Guppy , two parts (FASTQ-fail and FASTQ-pass) will appear. I guess the missing data may belong to the failed part, so it cannot be displayed in the final fastq file. So the solution I've come up with so far is to split fast5 into a single file, import guppy, output a single fastq file, and make sure the two correspond so that it works. I couldn't think of a more convenient way until now. Thanks for your help.