Closed jasondk closed 8 years ago
I'll try to fix this soon, I don't think that the fix is easy if you don't know the ExaML code well, please use the RAxML google group for reporting bugs in the future, thereby all users are aware of potential problems.
Alexis
On 11.05.2015 19:43, A.P. Jason de Koning wrote:
In |axml.h|, the |rawdata->sites| variable is defined as type |int|. Attempting to compress an alignment with about 3B positions is resulting in a "too few sites" error, presumably because we are overflowing the |int|. We will also have more than 32k site patterns after compression, and some of these will occur more than 32k times in the dataset - so we will still be causing overflows in the site/alias indexes even after changing |sites| to |long long int|. Is there a quick fix for this? Thanks!
— Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8.
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org
Dear Jason,
I think that I have fixed it but I need access to the dataset for testing.
Cheers,
Alexis
On 20.05.2015 21:36, Alexandros Stamatakis wrote:
I'll try to fix this soon, I don't think that the fix is easy if you don't know the ExaML code well, please use the RAxML google group for reporting bugs in the future, thereby all users are aware of potential problems.
Alexis
On 11.05.2015 19:43, A.P. Jason de Koning wrote:
In |axml.h|, the |rawdata->sites| variable is defined as type |int|. Attempting to compress an alignment with about 3B positions is resulting in a "too few sites" error, presumably because we are overflowing the |int|. We will also have more than 32k site patterns after compression, and some of these will occur more than 32k times in the dataset - so we will still be causing overflows in the site/alias indexes even after changing |sites| to |long long int|. Is there a quick fix for this? Thanks!
— Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8.
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org
Hey Alexis,
Thanks so much and sorry for the delay in responding. You can download the compressed dataset (5GB, sorry!) here http://hyperion.ucalgary.ca/example.phy.bz2. I’ll leave the link up for a couple of days. If you have a problem downloading it, you could just simulate a similar dataset. The dimensions are 7 OTUs and 3,036,303,846 sites with very little divergence (most of this will compress out if indexing site patterns).
Best wishes,
A.P. Jason de Koning, Ph.D.
Assistant Professor University of Calgary, Faculty of Medicine and Alberta Children's Hospital Research Institute for Child and Maternal Health Dept. of Biochemistry and Molecular Biology Dept. of Medical Genetics
Health Sciences Centre 1150 Suite 3330 Hospital Drive N.W. Calgary, Alberta T2N 4N1 Canada
Office: 403-210-7638 | Fax: 403-270-8928 Email: jason.dekoning@ucalgary.ca Web: http://lab.jasondk.io
On May 26, 2015, at 1:29 PM, Alexis Stamatakis notifications@github.com wrote:
Dear Jason,
I think that I have fixed it but I need access to the dataset for testing.
Cheers,
Alexis
On 20.05.2015 21:36, Alexandros Stamatakis wrote:
I'll try to fix this soon, I don't think that the fix is easy if you don't know the ExaML code well, please use the RAxML google group for reporting bugs in the future, thereby all users are aware of potential problems.
Alexis
On 11.05.2015 19:43, A.P. Jason de Koning wrote:
In |axml.h|, the |rawdata->sites| variable is defined as type |int|. Attempting to compress an alignment with about 3B positions is resulting in a "too few sites" error, presumably because we are overflowing the |int|. We will also have more than 32k site patterns after compression, and some of these will occur more than 32k times in the dataset - so we will still be causing overflows in the site/alias indexes even after changing |sites| to |long long int|. Is there a quick fix for this? Thanks!
— Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8.
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org — Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8#issuecomment-105641958.
Hi Jason,
The modified parser works now, how quickly do you need the fix?
I am in the middle of a larger re-design, thus the code with the fixed parser is not ready for release yet.
Below is the output of the parser, does that look right? It looks rather weird to me.
Alexis
Pattern compression: ON
Alignment has 200630281 completely undetermined sites that will be automatically removed from the binary alignment file
Your alignment has 5956 unique patterns
Under CAT the memory required by ExaML for storing CLVs and tip vectors will be 1375836 bytes 1343 kiloBytes 1 MegaBytes 0 GigaBytes
Under GAMMA the memory required by ExaML for storing CLVs and tip vectors will be 5378268 bytes 5252 kiloBytes 5 MegaBytes 0 GigaBytes
Please note that, these are just the memory requirements for doing likelihood calculations! To be on the safe side, we recommend that you execute ExaML on a system with twice that memory.
Binary and compressed alignment file written to file HUGE.binary
Parsing completed, exiting now ...
On 26.05.2015 23:06, A.P. Jason de Koning wrote:
Hey Alexis,
Thanks so much and sorry for the delay in responding. You can download the compressed dataset (5GB, sorry!) here http://hyperion.ucalgary.ca/example.phy.bz2. I’ll leave the link up for a couple of days. If you have a problem downloading it, you could just simulate a similar dataset. The dimensions are 7 OTUs and 3,036,303,846 sites with very little divergence (most of this will compress out if indexing site patterns).
Best wishes,
- Jason
A.P. Jason de Koning, Ph.D.
Assistant Professor University of Calgary, Faculty of Medicine and Alberta Children's Hospital Research Institute for Child and Maternal Health Dept. of Biochemistry and Molecular Biology Dept. of Medical Genetics
Health Sciences Centre 1150 Suite 3330 Hospital Drive N.W. Calgary, Alberta T2N 4N1 Canada
Office: 403-210-7638 | Fax: 403-270-8928 Email: jason.dekoning@ucalgary.ca Web: http://lab.jasondk.io
On May 26, 2015, at 1:29 PM, Alexis Stamatakis notifications@github.com wrote:
Dear Jason,
I think that I have fixed it but I need access to the dataset for testing.
Cheers,
Alexis
On 20.05.2015 21:36, Alexandros Stamatakis wrote:
I'll try to fix this soon, I don't think that the fix is easy if you don't know the ExaML code well, please use the RAxML google group for reporting bugs in the future, thereby all users are aware of potential problems.
Alexis
On 11.05.2015 19:43, A.P. Jason de Koning wrote:
In |axml.h|, the |rawdata->sites| variable is defined as type |int|. Attempting to compress an alignment with about 3B positions is resulting in a "too few sites" error, presumably because we are overflowing the |int|. We will also have more than 32k site patterns after compression, and some of these will occur more than 32k times in the dataset - so we will still be causing overflows in the site/alias indexes even after changing |sites| to |long long int|. Is there a quick fix for this? Thanks!
— Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8.
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org — Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8#issuecomment-105641958.
— Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8#issuecomment-105664817.
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org
Hey Alexis, this looks approximately correct to me. We’d previously run just the variable sites from this dataset and had similar results. Can you possibly make the binary output of the parser for this dataset available to us for download? Or allow us access to the revised parser? This is for the last piece of a student project that is otherwise complete. Thanks! Jason
On May 28, 2015, at 12:30 AM, Alexis Stamatakis notifications@github.com wrote:
Hi Jason,
The modified parser works now, how quickly do you need the fix?
I am in the middle of a larger re-design, thus the code with the fixed parser is not ready for release yet.
Below is the output of the parser, does that look right? It looks rather weird to me.
Alexis
Pattern compression: ON
Alignment has 200630281 completely undetermined sites that will be automatically removed from the binary alignment file
Your alignment has 5956 unique patterns
Under CAT the memory required by ExaML for storing CLVs and tip vectors will be 1375836 bytes 1343 kiloBytes 1 MegaBytes 0 GigaBytes
Under GAMMA the memory required by ExaML for storing CLVs and tip vectors will be 5378268 bytes 5252 kiloBytes 5 MegaBytes 0 GigaBytes
Please note that, these are just the memory requirements for doing likelihood calculations! To be on the safe side, we recommend that you execute ExaML on a system with twice that memory.
Binary and compressed alignment file written to file HUGE.binary
Parsing completed, exiting now ...
On 26.05.2015 23:06, A.P. Jason de Koning wrote:
Hey Alexis,
Thanks so much and sorry for the delay in responding. You can download the compressed dataset (5GB, sorry!) here http://hyperion.ucalgary.ca/example.phy.bz2. I’ll leave the link up for a couple of days. If you have a problem downloading it, you could just simulate a similar dataset. The dimensions are 7 OTUs and 3,036,303,846 sites with very little divergence (most of this will compress out if indexing site patterns).
Best wishes,
- Jason
A.P. Jason de Koning, Ph.D.
Assistant Professor University of Calgary, Faculty of Medicine and Alberta Children's Hospital Research Institute for Child and Maternal Health Dept. of Biochemistry and Molecular Biology Dept. of Medical Genetics
Health Sciences Centre 1150 Suite 3330 Hospital Drive N.W. Calgary, Alberta T2N 4N1 Canada
Office: 403-210-7638 | Fax: 403-270-8928 Email: jason.dekoning@ucalgary.ca Web: http://lab.jasondk.io
On May 26, 2015, at 1:29 PM, Alexis Stamatakis notifications@github.com wrote:
Dear Jason,
I think that I have fixed it but I need access to the dataset for testing.
Cheers,
Alexis
On 20.05.2015 21:36, Alexandros Stamatakis wrote:
I'll try to fix this soon, I don't think that the fix is easy if you don't know the ExaML code well, please use the RAxML google group for reporting bugs in the future, thereby all users are aware of potential problems.
Alexis
On 11.05.2015 19:43, A.P. Jason de Koning wrote:
In |axml.h|, the |rawdata->sites| variable is defined as type |int|. Attempting to compress an alignment with about 3B positions is resulting in a "too few sites" error, presumably because we are overflowing the |int|. We will also have more than 32k site patterns after compression, and some of these will occur more than 32k times in the dataset - so we will still be causing overflows in the site/alias indexes even after changing |sites| to |long long int|. Is there a quick fix for this? Thanks!
— Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8.
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org — Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8#issuecomment-105641958.
— Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8#issuecomment-105664817.
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org — Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8#issuecomment-106194236.
just sent the code to your university email,
alexis
On 29.05.2015 16:23, A.P. Jason de Koning wrote:
Hey Alexis, this looks approximately correct to me. We’d previously run just the variable sites from this dataset and had similar results. Can you possibly make the binary output of the parser for this dataset available to us for download? Or allow us access to the revised parser? This is for the last piece of a student project that is otherwise complete. Thanks! Jason
On May 28, 2015, at 12:30 AM, Alexis Stamatakis notifications@github.com wrote:
Hi Jason,
The modified parser works now, how quickly do you need the fix?
I am in the middle of a larger re-design, thus the code with the fixed parser is not ready for release yet.
Below is the output of the parser, does that look right? It looks rather weird to me.
Alexis
Pattern compression: ON
Alignment has 200630281 completely undetermined sites that will be automatically removed from the binary alignment file
Your alignment has 5956 unique patterns
Under CAT the memory required by ExaML for storing CLVs and tip vectors will be 1375836 bytes 1343 kiloBytes 1 MegaBytes 0 GigaBytes
Under GAMMA the memory required by ExaML for storing CLVs and tip vectors will be 5378268 bytes 5252 kiloBytes 5 MegaBytes 0 GigaBytes
Please note that, these are just the memory requirements for doing likelihood calculations! To be on the safe side, we recommend that you execute ExaML on a system with twice that memory.
Binary and compressed alignment file written to file HUGE.binary
Parsing completed, exiting now ...
On 26.05.2015 23:06, A.P. Jason de Koning wrote:
Hey Alexis,
Thanks so much and sorry for the delay in responding. You can download the compressed dataset (5GB, sorry!) here http://hyperion.ucalgary.ca/example.phy.bz2. I’ll leave the link up for a couple of days. If you have a problem downloading it, you could just simulate a similar dataset. The dimensions are 7 OTUs and 3,036,303,846 sites with very little divergence (most of this will compress out if indexing site patterns).
Best wishes,
- Jason
A.P. Jason de Koning, Ph.D.
Assistant Professor University of Calgary, Faculty of Medicine and Alberta Children's Hospital Research Institute for Child and Maternal Health Dept. of Biochemistry and Molecular Biology Dept. of Medical Genetics
Health Sciences Centre 1150 Suite 3330 Hospital Drive N.W. Calgary, Alberta T2N 4N1 Canada
Office: 403-210-7638 | Fax: 403-270-8928 Email: jason.dekoning@ucalgary.ca Web: http://lab.jasondk.io
On May 26, 2015, at 1:29 PM, Alexis Stamatakis notifications@github.com wrote:
Dear Jason,
I think that I have fixed it but I need access to the dataset for testing.
Cheers,
Alexis
On 20.05.2015 21:36, Alexandros Stamatakis wrote:
I'll try to fix this soon, I don't think that the fix is easy if you don't know the ExaML code well, please use the RAxML google group for reporting bugs in the future, thereby all users are aware of potential problems.
Alexis
On 11.05.2015 19:43, A.P. Jason de Koning wrote:
In |axml.h|, the |rawdata->sites| variable is defined as type |int|. Attempting to compress an alignment with about 3B positions is resulting in a "too few sites" error, presumably because we are overflowing the |int|. We will also have more than 32k site patterns after compression, and some of these will occur more than 32k times in the dataset - so we will still be causing overflows in the site/alias indexes even after changing |sites| to |long long int|. Is there a quick fix for this? Thanks!
— Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8.
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org — Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8#issuecomment-105641958.
— Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8#issuecomment-105664817.
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org — Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8#issuecomment-106194236.
— Reply to this email directly or view it on GitHub https://github.com/stamatak/ExaML/issues/8#issuecomment-106826195.
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org
In
axml.h
, therawdata->sites
variable is defined as typeint
. Attempting to compress an alignment with about 3B positions is resulting in a "too few sites" error, presumably because we are overflowing theint
. We will also have more than 32k site patterns after compression, and some of these will occur more than 32k times in the dataset - so we will still be causing overflows in the site/alias indexes even after changingsites
tolong long int
. Is there a quick fix for this? Thanks!