stg-tud / MUDetect

Mozilla Public License 2.0
24 stars 8 forks source link

Possibility to serialize APIUsageExample instances #16

Closed SNielebock closed 5 years ago

SNielebock commented 5 years ago

While using the miner, I noticed that most of the time is required for building the APIUsageExample instances than for the actual mining. Thus, it would be nice if - for testing purposes, e.g. mining the same samples several times - we could store the APIUsageExample graphs in a serialized form on the hard drive and reload them to speed up the whole process. This would require that the APIUsageExample class implements the Serializable interface (since the APIUsageExample class is an extension of a DirectedMultigraph from jGraphT which already implements the Serializable interface - see https://jgrapht.org/javadoc/org/jgrapht/graph/DirectedMultigraph.html) and maybe also some further classes, e.g. Location.

salsolatragus commented 5 years ago

Thanks for sharing this thought!

You may be able to achieve what you want by using our dot persistence export/import. This allows you to write APIUsageExamples to disk and read them back in again.

I didn't test the performance on large numbers of AUGs though. It should be faster then regenerating the graphs from scratch, especially because you read only one file instead of all the tiny source files. It might still be slower than Java serialisation though...

Does this solve the problem for you?

SNielebock commented 5 years ago

Hi @salsolatragus , thanks for the hint. I implemented your suggestion and for 30 Java files (roughly 100 AUGs) I could reduce the generation time from ~60sec (for generation from scratch) to ~10sec (loading from file). I still got some ImportExceptions from jgrapht (I did not check that deeply whether it could be just a problem of our implementation), however, mining results still look the same. Nevertheless, serialization might greatly boost performance, especially when file number increases (>1000 files).

salsolatragus commented 5 years ago

Well, a 6x performance boost doesn't sound too bad for a start. I would actually expect the boost to increase with the number of files. As for native serialisation: I'm not sure how much faster you'll get with this. Do you have reason to believe that the boost will again be significant? If so, it would be awesome if you would give it try and make a pull request. I'll probably only be able to get back to this next year...

Please report back about the ImportException issues, I haven't used the DOT (de)serialisation myself yet very extensively. Maybe there's a bug somewhere...

SNielebock commented 5 years ago

I will give it a try. But for the moment this works for me. However, during Christmas time, I might be able to make a pull request.

According to the ImportException, I opened a new issue (#17).

SNielebock commented 5 years ago

Note that I actually had to implement the serialization (see pull request #18) since I noticed that the persistent dot-export only works for APIUsageGraphs but not for APIUsageExamples. The problem is that the APIUsageExample class contains the Location, which is not stored in the dot representation. However, I do need this information.

salsolatragus commented 5 years ago

I've just deployed the current master with serializability merged as 0.0.3-SNAPSHOT.