Closed mstuart1 closed 5 years ago
To be clear, this is removing regenotypes from only the genepop? If so, I have a script that pulls gen_ids for each ligation_id and then assigns fish to parents/offspring files based on the metadata for the ligation_id. So all we need is to have each fish represented by their best-sequenced liation_id in the genepop. Does that make sense? I saw you closed issue 11, can you please comment on that issue with the final number of loci you were able to get? Thanks!
On Tuesday, April 16, 2019, Michelle Stuart notifications@github.com wrote:
In the case of recaptured fish, we observed and measured a fish in a specific year and then captured it again in a different year, where it potentially was a different size, color, sex.
If I remove all but one observation of the fish, it will make it look like that fish was only ever captured in one year and was only ever one size.
This seems like a loss of valuable information, and the loss of potential offspring or parents because it can only be in one pool.
How do we make sure that a fish that was a juvenile in 2012 but a multi-captured breeding adult in 2017 is in the correct pool for each analysis?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pinskylab/genomics/issues/19, or mute the thread https://github.com/notifications/unsubscribe-auth/AY8C-a8xC3gfS4T3DVyPUjXfvffwbdPRks5vhefjgaJpZM4cyfMb .
Stopping for the night at figuring out how to identify which gen_id is repeated, which has the most loci, and how to keep only the gen_id with the most loci. Need to pull from the regeno code.
I have a script that does this (summing all the occurrences of “00” for an allele and then choosing the ligation_id with the fewest “00”s. If that’s all that’s left, I’m fine leaving those in because it can be taken care of by the colony prep script I use. Thanks!
On Tuesday, April 16, 2019, Michelle Stuart notifications@github.com wrote:
Stopping for the night at figuring out how to identify which gen_id is repeated, which has the most loci, and how to keep only the gen_id with the most loci. Need to pull from the regeno code.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/pinskylab/genomics/issues/19#issuecomment-483850793, or mute the thread https://github.com/notifications/unsubscribe-auth/AY8C-fZZQP5Ol_W3c3oyHmIawOBECV5Nks5vhkCpgaJpZM4cyfMb .
There is a single line in dplyr that does this, I just needed to make a note for myself at the end of the day as to where I left off.
Pausing to deal with genomics #10 .
In the case of recaptured fish, we observed and measured a fish in a specific year and then captured it again in a different year, where it potentially was a different size, color, sex.
If I remove all but one observation of the fish, it will make it look like that fish was only ever captured in one year and was only ever one size.
This seems like a loss of valuable information, and the loss of potential offspring or parents because it can only be in one pool.
How do we make sure that a fish that was a juvenile in 2012 but a multi-captured breeding adult in 2017 is in the correct pool for each analysis?