rcsb / mmtf-java

The java implementation of the MMTF API, decoder and encoder.
http://mmtf.rcsb.org/
Apache License 2.0
11 stars 10 forks source link

simple alternative needed #22

Open BobHanson opened 7 years ago

BobHanson commented 7 years ago

I'm not seeing how one would integrate this mmtf-java package into a working Java program such as Jmol. It has a very large set of dependencies, including

org.msgpack.jackson.dataformat fasterxml.jackson.annotation fasterxml.jackson.core fasterxml.jackson.databind

This amounts to over 5 MB of code and over 500 classes.

The decoding task, at least, is not at all that difficult. I have implemented it in Jmol using a very simple class that has only three generic dependencies (a byte array converter, a binary document reader, and an efficient JavaScript-compatible StringBuffer equivalent). See Jmol's MessagePackReader

I offer this code as a possible very lightweight alternative to what is presently on this site (4 files total; under 20K total for either .class or .js files).

So perhaps just suggesting development of a similar "mmtf-java-decode-lite"

Bob Hanson

josemduarte commented 7 years ago

True, the dependencies have a lot of extra code and might seem complex but at the same time we get solid implementations where the experts in each of the topics (be it message pack or some other thing) have gone through a few release cycles, thinking about a good design and fixing bugs and issues that a much larger community has encountered along the road.

In general our philosophy is to use libraries and off-the-shelf components when those are available, in order to avoid getting into the same traps that others got in before. In my opinion, a larger package size is not a big price to pay for all that.

pwrose commented 7 years ago

I agree in general, however, we need to check if the jackson library is serializable, otherwise it won't work in our Spark applications.

On Tue, Aug 9, 2016 at 5:12 PM, Jose Manuel Duarte <notifications@github.com

wrote:

True, the dependencies have a lot of extra code and might seem complex but at the same time we get solid implementations where the experts in each of the topics (be it message pack or some other thing) have gone through a few release cycles, thinking about a good design and fixing bugs and issues that a much larger community has encountered along the road.

In general our philosophy is to use libraries and off-the-shelf components when those are available, in order to avoid getting into the same traps that others got in before. In my opinion, a larger package size is not a big price to pay for all that.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf-java/issues/22#issuecomment-238730849, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuwELPh9xFwvz_4xJNS_xMXyyNAd7-sks5qeRdvgaJpZM4Jgn-2 .

Peter Rose, Ph.D. Site Head, RCSB Protein Data Bank West (http://www.rcsb.org) San Diego Supercomputer Center (http://bioinformatics.sdsc.edu) University of California, San Diego +1-858-822-5497

BobHanson commented 7 years ago

I understand. But these are really simple functions.

Maybe my use case -- a program that needs to be as compact as possible and to run extremely efficiently in both Java and transpiled JavaScript -- is unusual. I don't know. Jmol/JSmol has to be so efficient in both Java and JavaScript in all respects that I rarely have the luxury of just pulling code off the shelf and using.

In any case, if you would expose one class with a few simple Java methods such as I am doing in Jmol, or one .js file with all that is needed, I think it would be much appreciated. The specs are clear enough and so well written that I was able to implement these without any reference code. Although, even there it might be nice to put in code snippets to show working examples. For example, Type 9 is:

public static float[] rldecodef(byte[] b, int n, float divisor) { float[] ret = new float[n]; for (int i = 0, pt = 3; i < n;) { int val = bytes4ToInt32(b, (pt++) << 2, true); for (int j = bytes4ToInt32(b, (pt++) << 2, true); --j >= 0;) ret[i++] = val / divisor; } return ret; }

It might be hard to see that from what is written there.

If you do want to include all those libraries as they are, would it be possible to explain to people exactly how to implement them? I could not figure it out myself. What I saw was a huge spider web of methods that, in the end, only needed to be about a dozen small methods. The needs are so minimal for decoding -- one relatively simple binary decoder method along with the 15 array codec methods.

Right?

​Bob

arose commented 7 years ago

If you can use a separate file for javascript, there is mmtf.js which includes everything for decoding and encoding in ~13KB (ungzipped).

BobHanson commented 7 years ago

And an unobscurified version of that?

On Thu, Aug 11, 2016 at 10:06 AM, Alexander Rose notifications@github.com wrote:

If you can use a separate file for javascript, there is mmtf.js https://github.com/rcsb/mmtf-javascript/blob/master/dist/mmtf.js which includes everything for decoding and encoding in ~13KB (ungzipped).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf-java/issues/22#issuecomment-239189833, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ7RWzfhUYmvWpD8ZH8DPQ6nRVF1jJ35ks5qezp3gaJpZM4Jgn-2 .

Robert M. Hanson Larson-Anderson Professor of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

arose commented 7 years ago

And an unobscurified version of that?

currently you have to build it yourself, opened an issue https://github.com/rcsb/mmtf-javascript/issues/12

andreasprlic commented 7 years ago

Just to chime in here. I did some profiling and it seems that the fundamental decoding in mmtf-java is slow. I suspect that using jackson adds some overhead and inefficiencies. We should try to do something as simple as what mmtf-javascript is doing!

pwrose commented 7 years ago

A simple solution would have other benefits, too. Including: 1. we will be able to ignore custom records users may add to MMTF, and 2. we can make the method serializable, which is required to make it run on Spark in a multi-server environment.

On Wed, Jan 11, 2017 at 9:15 AM, Andreas Prlic notifications@github.com wrote:

Just to chime in here. I did some profiling and it seems that the fundamental decoding in mmtf-java is slow. I suspect that using jackson adds some overhead an inefficiencies. We should try to do something as simple as what mmtf-javascript is doing!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf-java/issues/22#issuecomment-271930584, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuwEIqZ1isvr7Xant-z_58zTkimTT3tks5rRQ5EgaJpZM4Jgn-2 .

-- Peter Rose, Ph.D. Site Head, RCSB Protein Data Bank West (http://www.rcsb.org) San Diego Supercomputer Center (http://bioinformatics.sdsc.edu) University of California, San Diego +1-858-822-5497

iclkevin commented 7 years ago

I fully agree with Bob on this note. MMTF would be much more accessible if the limited functionality from the external dependencies could be internally written in the MMTF jars, thereby making it incredibly lightweight. I know Jose Duarte suggested that the implementations from the open source community are credible and should be used, but they also add overhead for other uses for their other users, and having control over this functionality internally would allow you to custom tailor the functions to your goals. And also prevent a lot headaches for those of us trying to include it :).

Love the MMTF format, btw. It is great to see all those bond types loaded in and it is very fast!

josemduarte commented 7 years ago

@iclkevin note that since version 1.0.5 the default is to decode through a (slightly modified) version of @BobHanson 's code. The msgpack lib dependency is in any case still there. Decoding through the msgpack lib can be switched on with a flag.

iclkevin commented 7 years ago

In order to get the library to run at all, I need the following:

For input:

jackson-annotations-2.8.0 jackson-core-2.8.8 jackson-databind-2.8.8 jackson-dataformat-msgpack-0.7.0-M5

I don't know if all of those versions are compatible, but that was the latest for what I can find.

As for output, so far I have:

commons-lang-2.6 msgpack-core-0.8.11

Decoding works fine for me, but I haven't been able to find the right dependencies for encoding.

The above version of msgpack doesn't seem to be compatible (java.lang.NoSuchMethodError: org.msgpack.core.MessagePacker.(Lorg/msgpack/core/buffer/MessageBufferOutput;)V). Whatever that means...

Do you still plan on using the jackson libraries? Do you have a zip of the current dependencies so we can run the encoder? Do you plan on using your own functions for encoding as well? I definitely don't want to turn on msgpack for decoding.

Thanks, Kevin Theisen

josemduarte commented 7 years ago

From the pom file I can see that mmtf-java depends currently on msgpack 0.7.1. Maven should take care of any sub-dependencies of that. Are you using maven?

iclkevin commented 7 years ago

Thanks, I see, we do not use Maven. We have a custom build system. Are there any other dependencies I should know about?

josemduarte commented 7 years ago

It's all in maven. If you really can't use maven then try something like mvn dependency:tree which should show the full dependency tree, then you can manually extract the dependencies from there. But that can be difficult.

sroughley commented 5 years ago

@iclkevin note that since version 1.0.5 the default is to decode through a (slightly modified) version of @BobHanson 's code. The msgpack lib dependency is in any case still there. Decoding through the msgpack lib can be switched on with a flag.

Is there an example of using this, or a way of removing the msgpacklib dependency completely? It would be very useful to have a completely self-contained deserializer / serializer if that is possible?

Thanks

Steve

josemduarte commented 5 years ago

As mentioned above the code does have already a self-contained serializer/deserializer. We left the msgpack dependency purely as a failback solution. But by now I'm pretty sure we can get rid of it. It shouldn't be too difficult to remove the dependency in pom and the related code. Open for pull requests :)

sroughley commented 5 years ago

I could only see a deserializer, but in that case it shouldn't be a huge leap to go in the opposite direction! I will have a think about it...

josemduarte commented 5 years ago

Indeed you are totally right, there's only a built-in deserializer. No serializer yet. My bad.

So getting rid of msgpack dependency requires writing a serializer after all, so definitely more work involved.

pwrose commented 5 years ago

The code has a self-contained deserializer for performance reasons, but the serializer still uses msgpack. Jmol has its own implementation, but I'm not sure if it contains a serializer, but be worth to check if someone is interested in developing one.

On Wed, Mar 13, 2019 at 10:07 AM Jose Manuel Duarte < notifications@github.com> wrote:

As mentioned above the code does have already a self-contained serializer/deserializer. We left the msgpack dependency purely as a failback solution. But by now I'm pretty sure we can get rid of it. It shouldn't be too difficult to remove the dependency in pom and the related code. Open for pull requests :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf-java/issues/22#issuecomment-472486750, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuwEDoCLfyjHq_goswEuTlQVmn0-7aYks5vWSCKgaJpZM4Jgn-2 .

BobHanson commented 5 years ago

I guess by "serializer" you mean software that creates mmtf format files. Peter is correct -- Jmol, as an analysis tool, doesn't need to actually make these files.

Jose, I agree completely with the off-the-shelf philosophy. Writing Java for JavaScript implementation (SwingJS) involves additional constraints, as not every single thing in Java will translate directly into JavaScript (long values greater than 2^52 and direct local disk access come to mind). Still, for the most part, we can just pop new Java packages in and have them immediately functional in JavaScript. As I recall, I had some problems with msgpack -- probably the 64-bit long integer issue -- and had to do a few Java adaptations in order to have that work in JavaScript.

I don't plan to write a serializer for JavaScript, but since SwingJS is just an implementation of Java in JavaScript, there's no particular reason the msgpack package couldn't be used directly by any SwingJS implementation, unless there is that long value issue.

My read was that there would be little or no interest in a JavaScript version of mmtf file creation. It seems to me that is best left to RCSB to do right, in Java or C++, and leave just the reading to the rest of us.

Bob

On Wed, Mar 13, 2019 at 12:33 PM Peter Rose notifications@github.com wrote:

The code has a self-contained deserializer for performance reasons, but the serializer still uses msgpack. Jmol has its own implementation, but I'm not sure if it contains a serializer, but be worth to check if someone is interested in developing one.

On Wed, Mar 13, 2019 at 10:07 AM Jose Manuel Duarte < notifications@github.com> wrote:

As mentioned above the code does have already a self-contained serializer/deserializer. We left the msgpack dependency purely as a failback solution. But by now I'm pretty sure we can get rid of it. It shouldn't be too difficult to remove the dependency in pom and the related code. Open for pull requests :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf-java/issues/22#issuecomment-472486750, or mute the thread < https://github.com/notifications/unsubscribe-auth/ADuwEDoCLfyjHq_goswEuTlQVmn0-7aYks5vWSCKgaJpZM4Jgn-2

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rcsb/mmtf-java/issues/22#issuecomment-472527469, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ7RWzGyiWU6PU-CfCve3gmK_t-CQOqOks5vWTZ2gaJpZM4Jgn-2 .

-- Robert M. Hanson Professor of Chemistry St. Olaf College Northfield, MN http://www.stolaf.edu/people/hansonr

If nature does not answer first what we want, it is better to take what answer we get.

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900

josemduarte commented 5 years ago

My read was that there would be little or no interest in a JavaScript version of mmtf file creation

The mmtf-javascript library can do both encoding and decoding.

sroughley commented 5 years ago

Yes, I started looking at serialisation last night. I wondered about a separate mmtf-lite or thereabouts without the extra external dependencies (I would aim for only dependency being on mmtf-api if possible)? A lot of the classes would be the same as existing or minor modification only.

There are obviously decisions around which int family of messagepack-ing to use for any given short/int/long, in particular around signed/unsigned. My instinct looking was to use the most compact form available for the given value, which does however mean that e.g. a long such as 32L might deserialize as something other than a long, but that would be an ok cast if required. Also, the JMol library looks to have refactored such as the link above is now dead for the deserialization.

No interest in the javascript from me!

Steve