Closed jan-gerling closed 4 years ago
This is indeed a bug. I can help you with hibernate magic once I’m back !
On Wed, 19 Feb 2020 at 17:26, Jan Gerling notifications@github.com wrote:
We create a lot of duplicate entries in the normalized tables, e.g. CommitMetaData: [image: image] https://user-images.githubusercontent.com/29139613/74825455-40ca4e80-530a-11ea-9775-c98eaf886c03.png
This happens, because we use a custom generated Id to link the tables. This id is newly generated for every new Instance, even though the CommitMetaData is the same object.
Can we change this in order to reduce the database size? Furthermore, this leads to another issue: #62 https://github.com/mauricioaniche/predicting-refactoring-ml/issues/62.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mauricioaniche/predicting-refactoring-ml/issues/63?email_source=notifications&email_token=AAAYTTFZZ437OM4GHPFP75TRDUCNHA5CNFSM4KXWKSK2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IOS3VLQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAYTTB5IEUZGSV7QJ7ZOY3RDUCNHANCNFSM4KXWKSKQ .
--
Maurício Aniche Delft University of Technology http://www.mauricioaniche.com
I noticed that the ProcessMetricsCollector generates many duplicates in the function codeMetrics https://github.com/mauricioaniche/predicting-refactoring-ml/blob/1ed8d6f0527992d49c29f1af69c48680067cf476/data-collection/src/main/java/refactoringml/ProcessMetricsCollector.java#L234
We can avoid these duplicates, by only storing as complete as possible StableCommits in the database. Related to issue #75.
@jan-gerling add this info on monitoring, but for now, it seems fine.
We have no duplicate commitmetadata entries, refactoringinstances, stablecommitinstances and projects in the db.
Great news!
We create a lot of duplicate entries in the normalized tables, e.g. CommitMetaData:
This happens, because we use a custom generated Id to link the tables. This id is newly generated for every new Instance, even though the CommitMetaData is the same object.
Can we change the id to the commitId in order to reduce the database size and allow a unique mapping? Furthermore, this leads to another issue: #62.