petermr / CEVOpen

Contentmining of Open phytochemical literature for medicinal activities
26 stars 19 forks source link

Mixtures of compounds #30

Open deadlyvices opened 4 years ago

deadlyvices commented 4 years ago

Something that has just occurred to me is that we need to address the handling of mixtures of compounds. Essential oils are classic examples of such. Alex Clark has done some work on this already, and he claims that there is no file format that handles these already. He's developed a tool which he says addresses this. https://cheminf20.org/2018/08/27/mixtures-cheminformatics/ I suspect that, contrary to Clark's intuition, CML is more than up to the job of handling this. I'd like to kick off a discussion about the representation of mixtures in CML and solicit some initial suggestions about how to tackle these.

deadlyvices commented 4 years ago

Here's a link to Clark's work: https://link.springer.com/article/10.1186/s13321-019-0357-4

petermr commented 4 years ago

Yes cml can hold mixtures without problem and the components can be annotated. Use occupancy to give the ratios of molecules

On Sat, 5 Oct 2019, 17:41 Clyde Davies, notifications@github.com wrote:

Some that has just occurred to me is that we need to address the handling of mixtures of compounds. Essential oils are classic examples of such. Alex Clark has done some work on this already, and he claims that there is no file format that handles these already. He's developed a tool which he says addresses this. https://cheminf20.org/2018/08/27/mixtures-cheminformatics/ I suspect that, contrary to Clark's intuition, CML is more than up to the job of handling this. I'd like to kick off a discussion about the representation of mixtures in CML and solicit some initial suggestions about how to tackle these.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/30?email_source=notifications&email_token=AAFTCS6YGMLVFRI4GFNJZ4LQNC7SDA5CNFSM4I5YN2YKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HP26PTQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCSZHEJGZYDO7IHLZMH3QNC7SDANCNFSM4I5YN2YA .

deadlyvices commented 4 years ago

Thinking about this some more, I hit on a useful analogy from my development work. Say, for the sake of argument, you're building an interface in WPF and you want to use a Grid layout. Grids have RowDefinitions and ColumnDefinitions. These have Height and Width properties.

Now, when it comes to sizing these, you can use absolute or relative sizing. Absolute means you just specify a value such as Width = "20". Relative sizing means that you have an arbitrary unit, the 'star', and you specify the values such as "3" or "5. So what you can end up with is a mixed definition

<Grid.ColumnDefinitions>
    <ColumnDefinition Width="50.5" />  <!-- Fixed width: 50.5 device units) -->
    <ColumnDefinition Width="69*" />   <!-- Take 69% of remainder -->
    <ColumnDefinition Width="31*"/>    <!-- Take 31% of remainder -->
</Grid.ColumnDefinitions>

You can see how this approach would be ideally applicable to mixtures of compounds! You could specify absolute concentrations, such as '0.05M n-BuLi' and then relative concentrations such as '1:3 THF/hexanes' using either absolute or 'star' sizing. 'Hexanes' itself would be a mixture in its own right. So, I suppose the next question is: does CML support the idea of relative and absolute occupancies?

petermr commented 4 years ago

On Sun, Oct 6, 2019 at 10:55 AM Clyde Davies notifications@github.com wrote:

> > > > > > > > Thanks - probably also possible in HTML5/CSS. > You can see how this approach would be ideally applicable to mixtures of > compounds! > > The frequency tables are in percentages so we actually need something like: (NOTE - I have forgotten chunks of CML) // does this exist? // 15 percent // 50 percent // calculate the remainder We have certainly done this and published it but I can't remember where! — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > , > or mute the thread > > . > -- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
deadlyvices commented 4 years ago

It's called moleculeList in CML. But yes, that would work. And if we want to specify ratios we could use something like

<moleculeList>
  <molecule occupancy="0.15" idref="mols:m1"/> // 15 percent
  <molecule occupancy="0.50" idref="mols:m2"/> // 50 percent
  <moleculeList occupancy="*">
     <molecule occupancy="75" idref="mols:m3"/> //say n-hexane
     <molecule occupancy="25" idref="mols:m4"/> //say 2-methyl-pentane
  </moleculeList>
</moleculeList>

to specify ratios of remainders

petermr commented 4 years ago

Good We may have to tweak the semantics of occupancy We can in principle have variables as well as numbers

On Sun, 6 Oct 2019, 15:21 Clyde Davies, notifications@github.com wrote:

It's called moleculeList in CML. But yes, that would work. And if we want to specify ratios we could use something like

// does this exist? // 15 percent // 50 percent //say n-hexane //say 2-methyl-pentane

to specify ratios of remainders

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/30?email_source=notifications&email_token=AAFTCS4PWVB6BUNNSZH5GCTQNHX63A5CNFSM4I5YN2YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAOLGMA#issuecomment-538751792, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS5R73WLL7DC6WWX3YLQNHX63ANCNFSM4I5YN2YA .

deadlyvices commented 4 years ago

How do variables work in specifying occupancy? Not sure I can see this.

petermr commented 4 years ago

From memory it's something like

0.1 ... ...

But the syntax is probably wildly off. I'd look in the CMLPolymer paper to start with...

P.

BTW I am always ready for skype/hangout calls at short notice - I am online most of the time although I also do exercises.

I am really keen to get automatic extraction done. I will investigate Stanford NLP . Have opened new Issue for this.

On Mon, Oct 7, 2019 at 10:50 AM Clyde Davies notifications@github.com wrote:

How do variables work in specifying occupancy? Not sure I can see this.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/30?email_source=notifications&email_token=AAFTCSYRVKTEHGOHOBSEAYDQNMA5PA5CNFSM4I5YN2YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAPVQYY#issuecomment-538925155, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS6XCGKXGXNYBJHFWD3QNMA5PANCNFSM4I5YN2YA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

Just seen the new issue. I have dabbled with the Stanford software, but only really scratched the surface. I'd suggest we tackle variable occupancies later. There's enough work to be getting on with

petermr commented 4 years ago

On Mon, Oct 7, 2019 at 12:32 PM Clyde Davies notifications@github.com wrote:

I'd suggest we tackle variable occupancies later. There's enough work to be getting on with

Agreed. In CEV/DAVE we'll be reading in known fixed values.

You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/30?email_source=notifications&email_token=AAFTCS3VCB5YVFGD54RF3ETQNMM4VA5CNFSM4I5YN2YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAP7FWI#issuecomment-538964697, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS2MOEGYON3CYGSYXSDQNMM4VANCNFSM4I5YN2YA .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK