Closed najohink closed 4 weeks ago
I forgot to add the photo of the unfiltered fastq output plot:
I am very confused and will need to think about this.
I filtered my dataset with FiltLong before running NanoComp and getting the weird result.
In the meantime, I figured out how to do what I wanted by running this:
df3 = pickle.load(open('barcode03_1-27kb_NanoComp-data.pickle', 'rb'))
bins = numpy.arange(0, 30000, 500)
h3 = numpy.histogram(df3['lengths'], bins=bins)
plt.bar(h3[1][:-1], height = h3[0], width=450)
xdata3 = (h3[1][:-1] + h3[1][1:])/2
ydata3 = xdata3 * h3[0]
plt.bar(xdata3, ydata3, width=450)
ydata3[xdata3 > 25000].sum() / ydata3.sum()
I was interested in knowing what percent of the total bases my full length sequence was. So I wanted to divide the 26kb bases by the total number of bases, but wanted to also keep out the weird long stuff from the dataset, hence filtering with FiltLong.
Does the plot without weighted look normal? I will explain later what those mean when I'm at the computer...
Yes, the others look normal. Only the two weighted plots have negative values.
So normalized plots mean that every dataset in the plot adds up to "1" - so datasets with significant differences in yield can still be compared on length. Without normalization, just the number of reads is used. And weighted means that instead of the number of reads per bin, the number of bases per bin is used (as is also the case in the minKNOW interface). As such, a read of 25000 bases in the bin of 24000-26000 will increase the count on the y-axis for 25000 rather than just 1.
Do you think it would be possible to share the data that caused this?
So I haven't been able to replicate this. Please let me know if someone runs into a similar issue.
Hello,
I am using NanoComp v1.23.1 and got a weird plot after filtering my input fastq files (see attached image).
When I did the same command on input fastq which were not filtered, I got normal plots. But after filtering my fastq files to only keep 1-27kb reads, I now get negative values in the weighted plots. Is this "normal"?
Can you also explain the difference between weighted and normalized?
best, S