ldbc-dev / ldbc_snb_datagen_deprecated2015

LDBC-SNB Data Generator
GNU General Public License v3.0
12 stars 5 forks source link

Date inconsistencies #4

Closed wileeam closed 10 years ago

wileeam commented 10 years ago

While using a dataset generated for 10 years and 1000 people, and another used for the recent ACM SIGMOD 2014 programming competition, we found some date inconsistencies, such as a comment's date before the actual join date of its author or even with posts. See below,

$ grep 991 person_0.csv
991|Abdul-Malik|Alhouthi|male|1988-03-29|2020-09-16T01:47:32Z|...
$ grep 374055 comment_hasCreator_person_0.csv
374055|991|
$ grep 374055 comment_0.csv
374055|2020-01-02T07:18:30Z|...

$ grep ^998 person.csv
998|Jimmy|Kalaba|female|1987-01-23|2012-02-14T10:39:43Z|41.72.96.18|Chrome
$ grep ^614730 post_hasCreator_person.csv
614730|998
$ grep ^614730 post.csv
614730||2012-02-14T04:45:53Z

I don't know if the timeline requirement is part of the project, and so correctness of this in the generator, but it would be nice to have...

ArnauPrat commented 10 years ago

Hi Guillermo,

Thank you for pointing this inconsistencies out. We will look at them and fix them as soon as possible.

Regards,

Arnau El 13/05/2014 15:13, "Guillermo" notifications@github.com escribió:

While using a dataset generated for 10 years and 1000 people, and another used for the recent ACM SIGMOD 2014 programming competition, we found some date inconsistencies, such as a comment's date before the actual join date of its author or even with posts. See below,

$ grep 991 person_0.csv 991|Abdul-Malik|Alhouthi|male|1988-03-29|2020-09-16T01:47:32Z|...$ grep 374055 comment_hasCreator_person_0.csv 374055|991|$ grep 374055 comment_0.csv 374055|2020-01-02T07:18:30Z|... $ grep ^998 person.csv 998|Jimmy|Kalaba|female|1987-01-23|2012-02-14T10:39:43Z|41.72.96.18|Chrome$ grep ^614730 post_hasCreator_person.csv 614730|998$ grep ^614730 post.csv 614730||2012-02-14T04:45:53Z

I don't know if the timeline requirement is part of the project, and so correctness of this in the generator, but it would be nice to have...

— Reply to this email directly or view it on GitHubhttps://github.com/ldbc/ldbc_socialnet_bm/issues/4 .

ArnauPrat commented 10 years ago

Hi Guillermo, I have found some bugs that could possibly be the cause of the date inconsistencies. However, I cannot confirm we have fully get rid off the problem. In the following weeks, we will implement a tool to check the integrity of the dataset.

Thanks for your help

wileeam commented 10 years ago

Awesome! That's great news! The less date inconsistencies, the better. In our scenario we would remove those comments/posts which are inconsistent anyhow but the less of these, the better as when we filed this issue they were quite a lot (and so removing them would shrink the dataset quite a lot).

Meanwhile, in the tool where we parse the dataset I think we are on our way to implement this check for the date inconsistency so we could give some statistics on the current dataset and the new one we generate once you push the changes to the repository.

Thanks for the quick answer and the upcoming fix! Much appreciated!

wileeam commented 10 years ago

Hello again Arnau,

it seems we are still getting some of these inconsistencies, at least with the comments' date and their corresponding authors' join date. I, myself, did a small change for the posts generation in the similar way you committed for the comments (1a267ce15ddb90999578dd1cc07f05ba0617ab76) but still I am not sure if this would fix the problem too since we got this inconsistency again with the comments.

Let us know if you want some further information to help you debug the problem.

/Guillermo

ArnauPrat commented 10 years ago

Hi Guillerom,

Thank you for reporting this. During the next two weeks we are very busy writing the LDBC benchmark specifications, but once we finish them I will look into this issue in detail.

Regards,

Arnau

On Mon, May 19, 2014 at 2:27 PM, Guillermo notifications@github.com wrote:

Hello again Arnau,

it seems we are still getting some of these inconsistencies, at least with the comments' date and their corresponding authors' join date. I, myself, did a small change for the posts generation in the similar way you committed for the comments (1a267ce) but still I am not sure if this would fix the problem too since we got this inconsistency again with the comments.

Let us know if you want some further information to help you debug the problem.

/Guillermo

— Reply to this email directly or view it on GitHub.

ArnauPrat commented 10 years ago

Hi again Guillermo,

I have solved a bug regarding the date inconsistencies between comments and creators. Other inconsistencies may remain, but at least this should be solved. It would be nice If you could run your date checker to confirm this.

Thanks a lot.

Arnau

wileeam commented 10 years ago

Hello again on this...

it seems that there are no more inconsistencies for now (between comments and/or posts). Therefore... good job! And thanks for the quick support! I'm filing another inconsistency in another issue for you to track it better, but this one seems correct as per our analysis.