Closed sadamowi closed 7 years ago
Sounds good to me, i'll update to secure dishes and then run through Mollusca at the class level. Did you want me to use placeholder references for Mollusca?
Also, I will try to address the other errors with the plot and plotly map. If you think the Annelida script is in a good state now, I could also work on an updated order level analysis script as well if you want?
Also, really glad the results match now!
Best Regards, Matt
All great news! Thanks! I will send real molluscs seqs. I think we want to aim for real results now. Will reply more tomorrow.
Sent from Samsung Mobile
-------- Original message -------- From: Matthew Orton Date:12-07-2016 11:28 PM (GMT-05:00) To: m-orton/R-Scripts Cc: Sarah Adamowicz , Author Subject: Re: [m-orton/R-Scripts] Sally tasks for next week (note on Dec 7th) (#23)
Sounds good to me, i'll update to secure dishes and then run through Mollusca at the class level. Did you want me to use placeholder references for Mollusca?
Also, I will try to address the other errors with the plot and plotly map. If you think the Annelida script is in a good state now, I could also work on an updated order level analysis script as well if you want?
Also, really glad the results match now!
Best Regards, Matt
- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/m-orton/R-Scripts/issues/23#issuecomment-265650274, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AV89Ok3svIZYyIi93rFULEk62XhTQ0JRks5rF4d4gaJpZM4LHa33.
Hi again Matt,
Thank you for having a look at the error messages relating to the plotting. I think those plots are helpful, and it would be great if you are able to sort out those functions to run in the updated version of R.
Have you been in contact with Winfield? Do you know if he is also actively testing the code? I suggest to mention the versioning decision to him as well as the Annelida update. Thank you.
I will prepare Mollusca sequences and hopefully send those today. Does that work for you, given other commitments, if your computer has to chug away for a day or two on that larger phylum?
Thank you for trying that. It occurred to me that at the very least we should be able to speed up the alignment step at the centroid stage. Within BINs, sequence variability is very low, often <1% and usually up to maximum of about 2.5%. So, we should be able to gain alignment efficiency there. That would be worth considering.
About the order-level analysis, I suggest that we check whether Jacqueline can check the Annelida code in the near future. It would seem most efficient to proceed to the order pipeline after we are confident in the final version of the class pipeline. I'd like to be mindful of your other commitments. What do you think?
Cheers, Sally
Hi Sally,
I haven't heard from Winfield in a few days but ill contact him and let him know about the script changes and the updated versioning.
No problem on Mollusca, I should be able to run through it this weekend and let you know how it goes.
For the centroid alignment, maybe we could set the diags setting to True on the muscle command to speed it up more? For Annelida it seemed like it was able to run through each BIN quite quickly but maybe for larger taxa it would become useful to have this setting turned on.
As for the order level analysis, I agree that it would be good for Jacqueline to take a look at the Annelida script first before proceeding further.
Best Regards, Matt
Hi Matt,
Thanks for touching touch with Winfield.
I agree about your suggested setting change for the centroid alignment step. Indeed, there are many more total sequences and BINs for Mollusca (particularly class Gastropoda) than for Annelida.
Would you mind making that change in the Annelida branch too? That way, I would use the same setting as you as I run through the Annelida code a final time with the newest R version.
Also, as an update ... I received my Compute Canada renewal notice today. I have asked Jacqueline if she has example job files. I understand the procedure for submitting jobs may be somewhat different compared to what the McGill folks use for the Quebec cluster. So, hopefully I can obtain an example file so that we can use that resource too, as needed. I would plan to submit a small job first, one we can also run on a local computer, for comparison prior to moving to a big task like Arthropoda.
Cheers,
Sally
-- Sarah (Sally) J. Adamowicz, Ph.D. Associate Professor Biodiversity Institute of Ontario & Department of Integrative Biology University of Guelph 50 Stone Road East Guelph, Ontario N1G 2W1 Canada
Email: sadamowi@uoguelph.ca Phone: +1 519 824-4120 ext. 53055 Fax: +1 519 824-5703 Office: Centre for Biodiversity Genomics 113 http://www.dnabarcoding.ca/ http://www.barcodinglife.org/ http://www.uoguelph.ca/ib/people/faculty/adamowicz.shtml
From: Matthew Orton notifications@github.com Sent: Thursday, December 8, 2016 11:12:42 AM To: m-orton/R-Scripts Cc: Sarah Adamowicz; Author Subject: Re: [m-orton/R-Scripts] Sally tasks for next week (note on Dec 7th) (#23)
Hi Sally,
I haven't heard from Winfield in a few days but ill contact him and let him know about the script changes and the updated versioning.
No problem on Mollusca, I should be able to run through it this weekend and let you know how it goes.
For the centroid alignment, maybe we could set the diags setting to True on the muscle command to speed it up more? For Annelida it seemed like it was able to run through each BIN quite quickly but maybe for larger taxa it would become useful to have this setting turned on.
As for the order level analysis, I agree that it would be good for Jacqueline to take a look at the Annelida script first before proceeding further.
Best Regards, Matt
- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/m-orton/R-Scripts/issues/23#issuecomment-265780041, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AV89OhzjIndxhcwpf-GtAUlYSpGIDPQaks5rGCx6gaJpZM4LHa33.
Hi Sally,
Winfield just got back to me confirming he would help test the Annelida code. I made sure to mention about the updated versioning as well.
I also just updated the script and set diags to true in the muscle command for the Annelida branch as well.
Good to hear about Compute Canada, hopefully it will be a useful resource for us.
Best Regards, Matt
Thank you very much Matt. I will update you as I complete the above issues.
Cheers, Sally
Haha, guess we know where these version names are coming from.
Hi Matt,
I am happy to report that steps 1-3 above are complete. I got the same results using the newer R (dishes) compared to the previous version (pumpkin). As well, with the exception of the previous errors relating to plotting (and I think one new package that seems to be needed), everything ran smoothly, and there were no new errors.
I like how you have the FASTA files now optional for exporting but in a streamlined set of commands, covering all classes present, without repeating the alignment.
I will proceed with the other tasks.
Cheers, Sally
Hi Sally,
Glad you like the FASTA commands and the script is running smoothly. I'm currently running through the centroid alignments for Mollusca. Seems to be running well so far.
Best Regards, Matt
Hi Matt,
I am happy to report that tasks 1-9 above are complete. I will close this issue and generate a new task list.
Cheers, Sally
Notes about my tasks:
(For step #9, I was wondering if you'd be willing to try to run Mollusca at the class level on your better computer? If that's possible, I think that would be a better choice for that phylum. However, if we do end up needing to go with order, we would lose some unidentified sequences, or sequences identified using an alternative taxonomic hierarchy, but I think that isn't catastrophic for the project. I contacted Compute Canada, but they indicated they are still awaiting approval from Guelph for my account. Hopefully that will be sorted out soon and so we would hopefully have access to more computing resources if needed.)
Let me know if I missed something!
Cheers, Sally