luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
301 stars 37 forks source link

Boost::filesystem::status: File name too long #157

Closed Luisitox closed 3 years ago

Luisitox commented 3 years ago

Hi! I am using a freshly installed conda environment (octopus version 0.6.3-beta Target: x86_64 Linux 5.4.0-1041-azure) with the command

conda create --name octopus -c conda-forge -c bioconda octopus -y

and I am having the following error:

octopus --threads 1 --organism-ploidy 4 --reference pseudomolecule.fa --reads-file BIG_RG.bam -o  BIG_octopus.vcf 
[2021-04-04 23:57:32] <INFO> ------------------------------------------------------------------------
[2021-04-04 23:57:32] <INFO> octopus v0.6.3-beta
[2021-04-04 23:57:32] <INFO> Copyright (c) 2015-2019 University of Oxford
[2021-04-04 23:57:32] <INFO> ------------------------------------------------------------------------
[2021-04-04 23:59:51] <EROR> An unclassified error has occurred:
[2021-04-04 23:59:51] <EROR>
[2021-04-04 23:59:51] <EROR>     **Boost::filesystem::status: File name too long:**
[2021-04-04 23:59:51] <EROR>     "�&v��`��e('�NM�lO����#xwg�{F;�9��ӧO��I����_C�Bw��]����d���҃�F'{�ƃ|xx�R���yz������}��'o�%�4p����!&�9��;��
�}���v^��|��>0:������~�����?z��kϿ�ы����҃�9��pqp>}�}���c7��x�}w��zd�����4x���
[2021-04-04 23:59:51] <EROR>     ��v�}g��{0 #z8J����\�/G�˅���a�r�.
[2021-04-04 23:59:51] <EROR>
[2021-04-04 23:59:51] <EROR> To help resolve this error submit an error report.
[2021-04-04 23:59:51] <INFO> ------------------------------------------------------------------------

Do you know how to overcome this error?

Thanks,

Luis

dancooke commented 3 years ago

Hi, thanks for the bug report. Can you please post the fasta index of pseudomolecule.fa?

Luisitox commented 3 years ago

Sure, here it is: (I just added the .txt extension to make it an allowed file for github upload)

pseudomolecule.fa.fai.txt

dancooke commented 3 years ago

Thanks. Doesn't look like there should be anything problematic there. I notice you're using v0.6.3b, which is very old. It's possible that this has already been resolved somewhere along the line (although I don't recall having come across this before). If you can send a small test example that triggers the problem then I can take a look. Otherwise, the Bioconda version should be updated in the next few days to the latest version, so you can wait until that's available and try again.

dancooke commented 3 years ago

The Bioconda version has finally been updated to v0.7.4 - if you're still having this issue please can you try updating and re-running?

Luisitox commented 3 years ago

Hi Dan, thanks for letting me know.

I reinstalled the updated software but I'm still having the same error:

(octopus) [luisdh@access1 supermerged]$ octopus  --organism-ploidy 4 --reference pseudomolecule.fa --reads-file HIGH_L1_merged.bam -o HIGH_octopus.vcf                  
[2021-05-18 11:59:24] <INFO> ------------------------------------------------------------------------
[2021-05-18 11:59:24] <INFO> octopus v0.7.4
[2021-05-18 11:59:24] <INFO> Copyright (c) 2015-2021 University of Oxford
[2021-05-18 11:59:24] <INFO> ------------------------------------------------------------------------
[2021-05-18 12:03:14] <EROR> An unclassified error has occurred:
[2021-05-18 12:03:14] <EROR>
[2021-05-18 12:03:14] <EROR>     Boost::filesystem::status: File name too long:
[2021-05-18 12:03:14] <EROR>     "�ǖ�Ç�z��5�X�  ٳ1&��=ۛ^�g3�(&��2�������O��8
�A��,��)���a�Lp`{���~��9{�cO'�x�}���=oL�'���p.|8�v(�܀}82�ߛz{x�O�������1��7u����t��v���&��dL�)�N�dL������L       &W�����l۵A��7�#<r�d�&�z�����{�G�)u������C،p�1��.�{�LG�`�)%��Ք%؂�Z��>%��&���}���3�N�c��C�O��c:³�w)����z�ڳ�}oߝ���������_��U����;L�Y���=߲&�־�bϛ�9�Rol�y�W��1�m��\����d��dD��G>�\w��c;�]�  ��0��ذ�t���'ĚN���G�1ٛ�=wЫ)J0S.�Ա}�"���t�'�=u'>���7���ވ��^��UD���
                                                                                                                          ,f3���!�e�;;aϝ�#�O�U��LE�Ա<�8��,��agb�����c���m��!(q=�r��鎂`J&{6q�dl;cw�ө5��愰�&�}�q��1�cw��m��b��ﻖg�]Awk���
                                                                                                                                                                                                                                       ��bp�+�m���֩�=u����ٽ��>���w���S�1�q<�K��7��oNn|u��Sr"~���w�?<���]D��A<z�#�}Ýw���#�֩��w��a��
���|�gO���,z�3�g�5��o#���œ=�:����)�YV@���V8���_q���ׄ8$�8!��r�@���]��Ags�3;���yg>O�'Ӄn�[]h�E�A7��;[i�=�L7��A�3�Jg��A:��N£��<=}8�8}�t��q�toccc���������nlll���i��띆����i�.
[2021-05-18 12:03:14] <EROR>
[2021-05-18 12:03:14] <EROR> To help resolve this error submit an error report.
[2021-05-18 12:03:14] <INFO> ------------------------------------------------------------------------

I tried different bam files, with and without the ploidy parameter and I get the same, with variations in the weird characters.

I would really like to use your software as I am working on a tetraploid... Do you have any hint about what might be causing the error?

dancooke commented 3 years ago

Ok. Is it possible for you to send me the reference fasta and a very small subset of the BAM?

Luisitox commented 3 years ago

Great, thanks, I just sent the files by email

dancooke commented 3 years ago

hmm, I've tried a few things but cannot reproduce. Do you get the same error when running on the subsetted BAM you sent to me?

Luisitox commented 3 years ago

Wow, that's strange. Indeed I checked the error before sending the files to you:

[luisdh@access1 test_octopus]$ conda --version
conda 4.10.1
[luisdh@access1 test_octopus]$ conda activate octopus
(octopus) [luisdh@access1 test_octopus]$ octopus --version
octopus version 0.7.4
Target: x86_64 Linux 5.4.0-72-generic
SIMD extension: AVX2
Compiler: GNU 9.3.0
Boost: 1_74
(octopus) [luisdh@access1 test_octopus]$ md5sum *
843f080842b83fcc580367d22264c766  Cq_PI614886_genome_V1_pseudomolecule.fa
0a79e319dabeb0e3dcc66cc076797e13  subset.bam
(octopus) [luisdh@access1 test_octopus]$ octopus --organism-ploidy 4 --reference Cq_PI614886_genome_V1_pseudomolecule.fa --reads-file subset.bam -o subset.vcf
[2021-05-19 09:36:41] <INFO> ------------------------------------------------------------------------
[2021-05-19 09:36:41] <INFO> octopus v0.7.4
[2021-05-19 09:36:41] <INFO> Copyright (c) 2015-2021 University of Oxford
[2021-05-19 09:36:41] <INFO> ------------------------------------------------------------------------
[2021-05-19 09:36:42] <EROR> An unclassified error has occurred:
[2021-05-19 09:36:42] <EROR>
[2021-05-19 09:36:42] <EROR>     Boost::filesystem::status: File name too long:
[2021-05-19 09:36:42] <EROR>     "��b�w\)�S%�4��`�x�
G�cu���m�]\��kQ���o��co�?�A�����@����go�^^��G�<��Q�S�r�K)C���&��OF�^��
[2021-05-19 09:36:42] <EROR>     1�I<   <�PF}�2��c�þc�5�?-Ǿ0?|�|��Bε�AZ�ܨ���zyyf@#��z�I�ƄeFDH��x���4����"�[�&Q���l�����.
�;0����5�U�|[����?�mp����pN�38'���x<����Gp?8�AB�IXp4�p�lisX�U   t���E�6A)+��~�                                          wr�m��LB�
[2021-05-19 09:36:42] <EROR>     u}���J������hU�b��A���*�_A=�o���w�jS�6����ZZtC���>z^��������0g��2m.
[2021-05-19 09:36:42] <EROR>
[2021-05-19 09:36:42] <EROR> To help resolve this error submit an error report.
[2021-05-19 09:36:42] <INFO> ------------------------------------------------------------------------

I also tried different servers of my cluster, with the same result.

dancooke commented 3 years ago

Ahhhhh - I should have paid closer attention to your command! The problem is that you use the option --reads-file for the input BAM. This option is for when you have a list of BAM files in a text file (see wiki). You just want the --reads option (short -I):

$ octopus --organism-ploidy 4 --reference Cq_PI614886_genome_V1_pseudomolecule.fa --reads subset.bam -o subset.vcf

I'll add some checks to catch this input error.

Luisitox commented 3 years ago

Ouch, thanks!