merenlab / anvio.org

The anvi'o community web page
4 stars 11 forks source link

Contribute to anvi'o 'omics vocabulary #3

Open meren opened 2 years ago

meren commented 2 years ago

'Omics terms can be difficult to learn. Having them described in a single page may help new comers to overcome this barrier quickly and transfer knowhow that may be a pain to transfer through more formal means. But developing a vocabulary is indeed a challenge, and certainly requires input from the community that uses these terms.

To address these issues, @ivagljiva and I have started a vocabulary page here:

https://anvio.org/vocabulary/

Now it is mature enough to ask for your help if you are interested in contributing to it. So the purpose of this issue is to detail how one can contribute to 'omics vocabulary.

Once you do it, you will be eligible to get some stickers from us by filling out this simple form:

https://forms.gle/kQTCysGDMtXe16AZ8

Yes, we certainly are in the territory of quid pro quo, and we are proud of it. Thank you in advance for playing and considering to contribute!

What is this, a vocabulary for ants?

Yes, well, no, it is not for ants, but we do know it is indeed very tiny. That's why we decided to open it to the community so it can grow, and we look forward to your input to improve it. For instance, here is an incomplete list of terms that could benefited from definitions by you:

Fine, I'll contribute. But how can I do it?

🎊

You can contribute by adding and defining new terms, or improve existing definitions. From a technical point, contributing is straightforward and can be done in multiple ways.

First, please read through the vocabulary page to have an idea of its language and approach to definitions. Once you know what you want to add to the vocabulary, or improve existing definitions, naturally you will want to edit the page. The content behind the vocabulary page is here:

https://github.com/merenlab/anvio.org/blob/main/vocabulary/index.md

Which means, this is the actual source file that is rendered and displayed at https://anvio.org/vocabulary/, and it is the file the contents of which must be updated. There are two ways to make a contribution: (1) send us a pull request (which is our preferred way), or (2) send us a comment. Both options are are detailed below. For the first one you will need a GitHub account. For the second one you will simply need git installed on your system (if you have anvi'o installed on your computer, you have git installed on it, too).

Send a pull request

The best way is to prepare a "Pull Request" for the GitHub repository, because it will forever associate your changes with your username. For this you will need to,

This is a very standard workflow on GitHub, and there is a lot of resources online like this one to learn how to craft and send a pull request. While I know it sounds boring and difficult if you don't already know it, learning this is a great way to get your foot in the door to make more contributions to open-source projects, and understand the power of version tracking.

Send your changes as a comment

If you don't want to deal with pull requests and all, you still can make a contribution very easily.

Go to your terminal, and type these commands to get a copy of the anvi'o web repository:

mkdir -p ~/github/
cd ~/github/
git clone https://github.com/merenlab/anvio.org.git

Then open this file in your text editor (perhaps something like Atom or MacDown, and certainly not Word or anything like Microsoft Word or TextEdit):

~/github/anvio.org/vocabulary/index.md

Once you are done with your edits, save your changes, and in your terminal type these commands:

cd ~/github/anvio.org/
git diff vocabulary/index.md

Copy the entire output, and come back to this issue, and paste it as a comment with a small description of what you have done. If you copy-paste this and then replace the text with your output, it will even look good:

``` diff
YOUR OUTPUT GOES HERE


# What happens after I contribute?

If your contribution gets into the vocabulary page, we will ask you to go fill out [this form](https://forms.gle/kQTCysGDMtXe16AZ8). After which we will send you a sticker, and list your name on the vocabulary page 😇 

**Thank you very much for your help in advance**.

---

# WAIT, I have more questions!

Of course you do!

### How much contribution is enough?

ANY contribution is. Anything you change and ends up on the page, *is* a contribution. We of course would like to see new terms, better definitions, etc, but you can also play the system for stickers.

### So I can make as many tiny contributions as I want and get so many stickers?

To conserve energy, we only promise to send one sticker per contribution, and no more than 4 letters to the same address :) Do whatever you wish to do with this EXTREMELY KEY information :p After 4, we will send our thanks. Virtually.

### Who decides whether a contribution ends up on the page or not?

Since someone has to take this responsibility on themselves, Meren will do it for now and forever unless someone else wants to do it.
jsevereyn commented 2 years ago

I added some (probably super basic) concepts, which are the ones I get asked the most. Im giving some basic lessons of genomics for geology students (they had very little background in biological sciences).

+### Indel
+
+Stands for insertion - deletion, of some bases in a DNA sequence. Those are considered short polymorphisms and the differ from point mutations, as the latter are substitutions.
+
+### Protein Structure
+
+Is the three-dimensional arrangement, shaping and folding of atoms in an amino acid-chain molecule (at different levels) to form a functional protein, which can be a monomeric and aggregated into homo- or hetero-polymers.
+
+### Shotgun sequencing
+
+Is the sequencing of the whole nucleotide sequences present in given sample, after being randomly fragmented into short pieces and ligated into known fragments during the sequencing library construction. Its called Shotgun from the concept that a large sequence is essentially broken up in to many, many smaller pieces, similar to how a shotgun shell breaks apart when fired.
+
+### Amplicon Sequencing
+
+(AKA: 16S, ITS, metabarcoding, metataxonomic). Is the sequencing of PCR targeted fragments of interest. It is based on the use of specific primers (degenerated in general) of some marker of interest, commonly resolutive for taxonomic classification (as it should have a variable region flanked by conserved ones).
+
+### Sequence Alignment
+
+Is a technique used to identify (at quantifiable levels) regions of similarity between sequences, which could be nucleotidic (DNA, RNA) or aminoacidic (Proteins). This simmilarities may indicate functional, structural and/or evolutionary relationships between those biological sequences. Aligned sequences are commonly represented as rows within a matrix. Sequence alignments can be used to calculate distances c
+
+### Mapping
+
+Read mapping is the process to align the reads on a reference genomes and/or sequences. Mapper tools takes reference as input and a set of reads to align one by one, allowing some degree of mismatches, indels and clipping of some short fragments on one or the two ends of the reads. This technique maps the positions of reads that are easily recognisable and only occur once in the reference.
+
+### COGs
+
+The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on phylogenetic classification of the proteins encoded in given set of data to help simplify functional categorization using controlled vocabulary. Each COGs includes proteins that are inferred to be orthologs (direct evolutionary counterparts), maximising their usefulness for functional and evolutionary studies. 
+
+### Annotation
+
+Is the process of identifying functional elements along genome sequences, thus giving meaning to it by by the identification of known elements or by comparison with databases through different analysis, comparison, estimation, precision, and other mining techniques deriving the structural and functional information of a protein or gene. This is a essential step as DNA sequencing generates sequence information without its functional role. 
meren commented 2 years ago

Dear @jsevereyn, thank you for these suggestions. Probably because they're coming from teaching material, some of them are partially overlapping with existing terms in the dictionary (i.e., mapping or shotgun sequencing), and others are very much needed, but needs more 'encyclopedia' like descriptions (i.e., indels). So these are not yet quite at a level of copy-paste convenience :)

jsevereyn commented 2 years ago

I figured, but maybe this could inspire others to make a substantial contribution by improving those needed ones

Thanks for the reply @meren