ldbc-dev / ldbc_snb_datagen_deprecated2015

LDBC-SNB Data Generator
GNU General Public License v3.0
12 stars 5 forks source link

Several improvements #1

Closed ArnauPrat closed 10 years ago

ArnauPrat commented 10 years ago

Changes by Xavier Sanchez before 09-12-2013

Dictionary

Major: IPAddressDictionary fix. The original implementation S3G2 had an unrealistic and erroneous conception of how IPv4 semantics (Ex: 192.168.1.0/24 would have been translated to the iprange: 192.168.1.0-192.168.1.23 instead of the true range 192.168.1.255). Sideeffect: Increased generation speed for the lookup ip->country which belongs.

Major: Simplification of LocationDictionary to a cleaner interface. More maintainable and user-friendly.

Major: Substitution of of methods leaking internal representation to other classes to a safer interface. 
Motivation: Safer data access and control (data access only via get/set).
Code clarity: Instead of accessing several obscure HashMap, Lists with arbitrary names access to the data through the Dictionary instance get or set.

Minor: Removal of inappropriate variable names. Ex: cumDistribution to cummulativeDistribution.

Minor: Code cleanup: Better variable and method names. Avoid the in method hardcoded values and upgrade such values to static class variables. Removal unnecessary class variables: Some were not used at all and others were used only in one method (wasted space). 

Minor/Major: Started adding method javadoc and commentaries.

Generator

Major: Enabled the generation statistics (id user range, countries and tag used,e tc...) using gson library. (was not published because it was pending Renzo approval since August).

Major: Started code cleanup and commentaries of the most relevant class: ScalableGenerator.json.

Minor: Generator classes interface cleanup. Removal of redundant methods.

Minor: Code cleanup.

Objects

Minor: Removal of the not used Stream classes.

Serializer

Minor/Major: Removed the erroneous @Override of the interface methods (Java interface methods does not use that tag).
Major: Use of the dictionaries instead of its leaked internal representation.
Major: Changed the method getTriplesGenerated() to something more generic and allow the CSV serializer inform how many rows it creates.
Minor: Added an EmptySerializer (it doesn't do anything at all) for debug purposes and avoid IO time.
Minor: Simplification of creators. Before there were 3 class creators, 2 of them were not used.
Minor: Basic Javadoc since the format representation is in constant change.
Minor: Code cleanup.
Minor: improved RDF semantincs in the turtle generation (subject predicated object which I was not aware of in that moment)

Storage

Minor: Removed a not used class.

Util

Nothing, but room for code cleanup is always open.

Vocabulary

Minor: Code cleanup: Removal of not used methods. Rename for clarity some static variables for those not illustrated to the RDF semantic: (ex: NS to NAMESPACE)

Changes by Arnau Prat

16-12-2013 Minor: Improved the formatting of emails. Removed accents and successions of more than one dot.

17-12-2013
Minor: Improved code readability by making variable names consistent, and by rewritting some pieces of code.