snorkel-team / snorkel

A system for quickly generating training data with weak supervision
https://snorkel.org
Apache License 2.0
5.81k stars 857 forks source link

In Concat operator make one optional, link words? #262

Closed ajratner closed 8 years ago

ajratner commented 8 years ago

In general: more flexibility

@ThomasPalomares @tberardini any examples?

tberardini commented 8 years ago

The mutant plants described in the Arabidopsis literature may have one or more genes mutated in them. In the text, the names of all the mutated genes are given and the phenotype is only relevant if linked to all three (or n) genes at the same time, meaning that we want all the relevant genes called as a unit for the candidate extraction.

Example sentences:

(1) gene names separated by '/'

" At the fully developed ovule stage the spl/nzz mutant phenotype resembles that of ashh2 , but in contrast to ashh2 , spl/nzz is characterized by the absence of a MMC at an early stage of ovule development , thus preventing embryo sac formation . "

spl/nzz = two genes, spl and nzz

phenotype = absence of MMC at early stage of ovule development and phenotype = embryo sac formation prevented

(2) Gene names concatenated, no space in between them.

To investigate the level of membrane lipid peroxidation , the same leaves were analyzed for MDA content ( malondialdehyde , a byproduct of lipid peroxidation ) : chy1chy2lut2lut5 and chy1chy2lut5 leaves showed higher accumulation of MDA upon stress treatment ( +120 % and +45 % , respectively ) , thus a far higher level of lipid peroxidation with respect to wild-type and lut2 plants ( +25 % ) ; chy1chy2lut2lut5 plants showed a far higher photosensitivity in high-light than chy1chy2lut5 ( Figure ) ; the latter was the xanthophyll mutant with the highest light sensitivity described so far [ ] .

chy1chy2lut2lut5 = four genes, chy1, chy2, lut2, lut5 chy1chy2lut5 = three genes, chy1, chy2, lut5

(3) Gene names concatenated, space in between gene names.

The presence of these residues may explain the small amount of GBSS detected on the ptst starch granules ( ) , which increased when starch was always present ( as seen in the ptst sex4 double mutant ; .

ptst sex4 = two genes, ptst and sex4

ajratner commented 8 years ago

Thanks for this!! Updates soon On Tue, Jun 21, 2016 at 3:44 PM Tanya Berardini notifications@github.com wrote:

The mutant plants described in the Arabidopsis literature may have one or more genes mutated in them. In the text, the names of all the mutated genes are given and the phenotype is only relevant if linked to all three (or n) genes at the same time, meaning that we want all the relevant genes called as a unit for the candidate extraction.

Example sentences:

(1) gene names separated by '/'

" At the fully developed ovule stage the _ spl/nzz mutant phenotype resembles that of ashh2 , but in contrast to ashh2 , spl/nzz _ is characterized by the absence of a MMC at an early stage of ovule development , thus preventing embryo sac formation . "

spl/nzz = two genes, spl and nzz

phenotype = absence of MMC at early stage of ovule development and phenotype = embryo sac formation prevented

(2) Gene names concatenated, no space in between them.

To investigate the level of membrane lipid peroxidation , the same leaves were analyzed for MDA content ( malondialdehyde , a byproduct of lipid peroxidation ) : _ chy1chy2lut2lut5 and chy1chy2lut5 leaves showed higher accumulation of MDA upon stress treatment ( +120 % and +45 % , respectively ) , thus a far higher level of lipid peroxidation with respect to wild-type and lut2 plants ( +25 % ) ; chy1chy2lut2lut5 plants showed a far higher photosensitivity in high-light than chy1chy2lut5 _ ( Figure ) ; the latter was the xanthophyll mutant with the highest light sensitivity described so far [ ] .

chy1chy2lut2lut5 = four genes, chy1, chy2, lut2, lut5 chy1chy2lut5 = three genes, chy1, chy2, lut5

(3) Gene names concatenated, space in between gene names.

The presence of these residues may explain the small amount of GBSS detected on the ptst starch granules ( ) , which increased when starch was always present ( as seen in the _ ptst sex4 _ double mutant ; .

ptst sex4 = two genes, ptst and sex4

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/HazyResearch/ddlite/issues/262#issuecomment-227594580, or mute the thread https://github.com/notifications/unsubscribe/ABgw_emaMkiHgKI2xHDhzT72si9IIIhAks5qOGk4gaJpZM4I6JBN .

ThomasPalomares commented 8 years ago

Thanks Tanya !

Just two precisions regarding the phenotypes in the first example: the dictionaries contain the parts 'absence of MMC', 'early stage' and 'ovule development' but we want to catch the whole 'the absence of a MMC at an early stage of ovule development' as one candidate.

Same for the second phenotype, the dictionary contains 'embryo sac formation' and we have a quality dictionary with the word 'prevent' or 'prevented' in it and we want to capture 'preventing embryo sac formation'