miRTop / incubator

Where all ideas and discussions happen to lead to new repositories
5 stars 4 forks source link

Read name #24

Open nikola-tesic opened 5 years ago

nikola-tesic commented 5 years ago

Hi miRTop guys,

I have a question - what should be the contents of 'Read'? Also, I was reading up on GFF3 documentation here, and it seems that all 'additional' attributes should be lowercase. Does that mean that 'Read' should be lowercase as well?

Best, Nikola

lpantano commented 5 years ago

Hi Nikola,

thanks a lot for this!

Actually you are right, what brings me to think about following the same rule for the miRNA format. Since this is an adaptation from the GFF3, seems right to keep upper case rule for the attributes we define here and let plp add others attributes that should start with lower case.

I’ll add this as a feature to add to the validator and add the explanation to the documentation.

Thanks a lot for chiming in here! This actually is a very good point and we need to be clearer!

cheers

On Jul 29, 2018, at 08:43, Nikola Tesic notifications@github.com wrote:

Hi miRTop guys,

I have a question - what should be the contents of 'Read'? Also, I was reading up on GFF3 documentation here, and it seems that all 'additional' attributes should be lowercase. Does that mean that 'Read' should be lowercase as well?

Best, Nikola

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

nikola-tesic commented 5 years ago

No problem, it's nice to have a unified format. Also, what should be the contents of 'Read'? Read name from FASTQ file or some generated read name?

lpantano commented 5 years ago

Thanks for the question!

It should be the read name that normally is used when mapping to miRNAs.

It could be the name of the fastq file, but normally the tools de-duplicate reads, so same reads from the fastq file are represented more than once are reduced to one sequence with a name that contains the counts. For instance,

@name2_in_fastq AAAAAAAAAAAAAAAAA @name1_in_fastq AAAAAAAAAAAAAAAAA

It will be:

@name_x2 AAAAAAAAAAAAAAAAA

Bottom line is that whatever was used to map to miRNA/precursors…is the name used. This field is not mandatory, but it is useful for debugging proposes.

Cheers

On Jul 31, 2018, at 8:55 AM, Nikola Tesic notifications@github.com wrote:

No problem, it's nice to have a unified format. Also, what should be the contents of 'Read'? Read name from FASTQ file or some generated read name?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/miRTop/incubator/issues/24#issuecomment-409210602, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi_HJc-gu7nK-WzmPf1ftOkgRf2jS_Vks5uMFO0gaJpZM4VlaDs.

nikola-tesic commented 5 years ago

Thank you! So, in your example, would it be OK to have 'Read=name2_in_fastq,name1_in_fastq'? Also, which attribute fields are mandatory and which are optional?

lpantano commented 5 years ago

Hi,

That will make a big Read name, since some sequences are million of times repeated.

So, I’d suggest add only the first name in the case you use from the raw fastq files and there is no collapsing previous GFF3 generation.

Thanks!

PS: https://github.com/miRTop/mirGFF3/blob/master/definition.md#columns contains the optionals versus required fields (see O and R before each field)

On Aug 1, 2018, at 12:28 PM, Nikola Tesic notifications@github.com wrote:

Thank you! So, in your example, would it be OK to have 'Read=name2_in_fastq,name1_in_fastq'? Also, which attribute fields are mandatory and which are optional?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/miRTop/incubator/issues/24#issuecomment-409635618, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi_HPBQ0lE_N9XkjcFw7X1wMm6ycikuks5uMdclgaJpZM4VlaDs.