BUSCO datasets - Githubissues

kokyriakidis commented 5 years ago

Hello, do I have to choose which datasets to include, or could I use them all? I am running an analysis on Chelonia Mydas.

The lineage is

Lineage( full )
cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Dipnotetrapodomorpha; Tetrapoda; Amniota; Sauropsida; Sauria; Archelosauria; Testudines; Cryptodira; Durocryptodira; Americhelydia; Chelonioidea; Cheloniidae; Caretta

Should I use Tetrapoda dataset? Should I use Tetrapoda AND eukaryota? Or should I use more?

AdamStuckert commented 5 years ago

Tetrapoda will probably be the most informative for your purposes.

macmanes commented 5 years ago

but to keep is simple, for the running of the assembly itself, just stick with the default Euk database. Once you have an assembly agree that Tetrapoda will be good!

kokyriakidis commented 5 years ago

I am using the latest docker image. Running the first command runs the pipeline all at once as it says. Should I run the annotation and the evaluation commands, or these are already run with the first command?

macmanes commented 5 years ago

running the 1st command runs the entire pipeline, including TransRate and BUSCO (with Euk database). After that, you can annotate or do whatever else you want to. Does this make sense?

kokyriakidis commented 5 years ago

Yes, and thank you both very much for this work!

kokyriakidis commented 5 years ago

@macmanes Another question! Can I use several samples together? Or I have to concatenate their _1 and _2 fastq files?

macmanes commented 5 years ago

Concatenate them all together 1st, but remember the rec for including samples. In general, we strongly recommend that you assembly 1 individual per treatment or group.

On Apr 29, 2019, at 9:32 AM, Konstantinos Kyriakidis notifications@github.com<mailto:notifications@github.com> wrote:

Caution - External Email

@macmaneshttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_macmanes&d=DwMCaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=lFmSBplGfvpPNKk6W2tN6-UcUrgjlsdpj7JuHtA6g_Y&m=AHyC2dT4dIzokRaGS8Jg_FFW31KF20Z3R43oKQ7M7aE&s=DuyDM8e5Uw6R1ATiZA2prOsEl3oDicFpKiDearDPN4M&e= Another question! Can I use several samples together? Or I have to concatenate their _1 and _2 fastq files?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_macmanes-2Dlab_Oyster-5FRiver-5FProtocol_issues_29-23issuecomment-2D487581074&d=DwMCaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=lFmSBplGfvpPNKk6W2tN6-UcUrgjlsdpj7JuHtA6g_Y&m=AHyC2dT4dIzokRaGS8Jg_FFW31KF20Z3R43oKQ7M7aE&s=sNpMigbx_NxiArihECLaqAPMvREeN7R3Q0CrNMg4SKw&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AABIHEH4LUYCGBZM6WMG5UDPS32IBANCNFSM4HI2CG3Q&d=DwMCaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=lFmSBplGfvpPNKk6W2tN6-UcUrgjlsdpj7JuHtA6g_Y&m=AHyC2dT4dIzokRaGS8Jg_FFW31KF20Z3R43oKQ7M7aE&s=CgEPsW2qI12OoSEExXKuUC1dVywrOQgnXiwl52TveEU&e=.

kokyriakidis commented 5 years ago

@macmanes Could you please explain why is that? Biological replicates wouldn't help assembling lower expressed regions?

kokyriakidis commented 5 years ago

@macmanes I have 6 RNAseq libraries (~35M reads each), 3 are normal 3 are not normal. Should I run 3 times the pipeline for the normal and then fuse them with orthofuser and do the same for the other 3 and then fuse the 2 merged? I have read that above 40M reads will be little to no improvement. Using 2 samples 1 from normal and 1 from not normal will it help to recall better transcripts?

macmanes commented 5 years ago

How about this - try one assembly using my rec - concatenate 2 individuals together (1 normal and 1 not), and then do another experiment where you concatenate all the reads together. See what you get?

Matt

On Apr 30, 2019, at 5:22 PM, Konstantinos Kyriakidis notifications@github.com<mailto:notifications@github.com> wrote:

Caution - External Email

@macmaneshttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_macmanes&d=DwMFaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=lFmSBplGfvpPNKk6W2tN6-UcUrgjlsdpj7JuHtA6g_Y&m=Wl1g_QPk9kl6c6-qBPu4_gLS-OoOVmeMsp5OO0WRZ6U&s=zl6nFp18rcuptBiioGMhCvJF8mlX2YYsb-C6oUr2xUg&e= I have 6 RNAseq libraries (~35M reads each), 3 are normal 3 are not normal. Should I run 3 times the pipeline for the normal and then fuse them with orthofuser and do the same for the other 3 and then do the same for the 2 merged?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_macmanes-2Dlab_Oyster-5FRiver-5FProtocol_issues_29-23issuecomment-2D488120256&d=DwMFaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=lFmSBplGfvpPNKk6W2tN6-UcUrgjlsdpj7JuHtA6g_Y&m=Wl1g_QPk9kl6c6-qBPu4_gLS-OoOVmeMsp5OO0WRZ6U&s=wVMDeZS6uLfC8hLtT4dmHG9S2ok9gpYsSNLDXtkhWVs&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AABIHEDSRV7X7W55ZIPQ5DLPTC2ALANCNFSM4HI2CG3Q&d=DwMFaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=lFmSBplGfvpPNKk6W2tN6-UcUrgjlsdpj7JuHtA6g_Y&m=Wl1g_QPk9kl6c6-qBPu4_gLS-OoOVmeMsp5OO0WRZ6U&s=1nx0TlrcD0SX1T55B708ErozrJAeMyy4PWzgquNwuy4&e=.

kokyriakidis commented 5 years ago

@macmanes Thank you for your reply! These 6 samples are from 3 pairs of siblings. Do you think I should choose 1 normal and it's not normal sibling? or chose one from another family?

macmanes commented 5 years ago

I think I’d try for choosing samples from within a family if possible, but not knowing how different families are, it’s hard to say.

Matt

On Apr 30, 2019, at 6:25 PM, Konstantinos Kyriakidis notifications@github.com<mailto:notifications@github.com> wrote:

Caution - External Email

Thank you for your reply! These 6 samples are from 3 pairs of siblings. Do you think I should choose 1 normal and it's not normal sibling? or chose one from another family?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_macmanes-2Dlab_Oyster-5FRiver-5FProtocol_issues_29-23issuecomment-2D488137700&d=DwMCaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=lFmSBplGfvpPNKk6W2tN6-UcUrgjlsdpj7JuHtA6g_Y&m=dcdOUEmL6_WDjqdVbHiixM4cSLpMjpBGB6-mIk8FunI&s=srCxAJwwLJKcvFkuOa9e8M_XP8jMXNOaiKCAqjbPy-E&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AABIHECXUU7UPSKVQVD3X7LPTDBMDANCNFSM4HI2CG3Q&d=DwMCaQ&c=c6MrceVCY5m5A_KAUkrdoA&r=lFmSBplGfvpPNKk6W2tN6-UcUrgjlsdpj7JuHtA6g_Y&m=dcdOUEmL6_WDjqdVbHiixM4cSLpMjpBGB6-mIk8FunI&s=ejGcMLYjQvX6r8wM4AOhfI3sipoKF2Qs5ZdtaeXCSWU&e=.

macmanes-lab / Oyster_River_Protocol

BUSCO datasets #29