lvclark / polyRAD

Genotype Calling with Uncertainty from Sequencing Data in Polyploids πŸŒπŸ“πŸ₯”πŸ πŸ₯
27 stars 8 forks source link

Error message on R when reading isoloci #36

Closed deilepaita closed 1 year ago

deilepaita commented 1 year ago

Hello Lindsay,

Thanks for the polyRAD package, it is well documented and the tutorials are awesome!

Following the steps for "Variant and Genotype Calling in Highly Duplicated Genomes" from https://lvclark.r-universe.dev/articles/polyRAD/isolocus_sorting.html, to call variants on my own dataset on a Miscanthus F2 diploid population, I get to the steps before importing the sorted dataset into polyRAD without any problems, but when trying to import the sorted .csv file generated by process_isoloci.py in R to polyRAD with the function readProcessIsoloci, I get the following error:

myRAD <- readProcessIsoloci("20230315_out_sam_multi_1_sorted.csv", possiblePloidies = list(2), taxaPloidy = 2) Reading file... Filtering and sorting loci... Building RADdata object... Error in sum(sapply(seq_len(nsites), function(i) polyRADsubmat[splitnuc1[[a1]][i], : invalid 'type' (list) of argument

I don't quite understand the error message. Could you help me figure out what am I doing wrong? I tried changing default arguments, but still gives me the same error message no matter what I do.

Thanks,

lvclark commented 1 year ago

Hello,

In order to debug this, I'd need a small dataset that could be used to reproduce the error on my own machine.

deilepaita commented 1 year ago

Here it is,

I tried to test it with the first 1000 lines, before posting them here, and the code worked. No error reported on R. But the error appears when using the whole file (~270K lines). So, I tried more lines (~2000), and then I noticed some blank spaces in the third and fourth columns on the original file. Could that be the problem?

test_2000lines.csv

Thanks for your quick reply!

lvclark commented 1 year ago

Sorry for the delayed reply. The blanks in the allele column could definitely cause this error. I see in Chr01-003677744-bot that one of the tags is much shorter than the other two, and I bet the issue has something to do with that. If you could send me the corresponding lines from the align.csv file that would be helpful.

deilepaita commented 1 year ago

Not a problem,

There are many cases like that one (Chr01-003677744-bot) in the sorted file. Attach you can find the corresponding lines to Chr01-003677744-bot in the align.csv file.

20230315_out_sam_multi_1_align_Chr01-003677744-bot.csv

Thanks for your help!

lvclark commented 1 year ago

Sorry again for the delayed response. I just pushed a bug fix to process_isoloci.py that I think will solve your problem. Let me know how it goes!

deilepaita commented 1 year ago

Thank you. I will let you know how it goes!

deilepaita commented 1 year ago

It work perfectly. I got no error this time.

Thanks for your help!