src-d / gitbase

SQL interface to git repositories, written in Go. https://docs.sourced.tech/gitbase
Apache License 2.0
2.06k stars 124 forks source link

Issue with commit_files on github.com/tensorflow/tensorflow #984

Closed alexpdp7 closed 4 years ago

alexpdp7 commented 4 years ago

The tensorflow repo has some oddity wrt. to the commit_files table. It seems that some commits in it have a very low commit_file count compared to other commits in the same timeframe.

One curious thing about this is that it seems that at least some of those commits (if not all) exist with the same hash within the same org.

See 6cd3502f6398456b326b58d47d5cbc421fbd2905 below:

MySQL [gitbase]> select * from commits where commit_hash in ('6cd3502f6398456b326b58d47d5cbc421fbd2905', '932fcbbd3836022a862d2479d716fc9c7563ff47');
+----------------------------------+------------------------------------------+--------------------+-----------------------+---------------------+----------------+---------------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+----------------------------------------------+
| repository_id                    | commit_hash                              | commit_author_name | commit_author_email   | commit_author_when  | committer_name | committer_email     | committer_when      | commit_message                                                                                                                                                                                                                                                        | tree_hash                                | commit_parents                               |
+----------------------------------+------------------------------------------+--------------------+-----------------------+---------------------+----------------+---------------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+----------------------------------------------+
| github.com/tensorflow/mlir       | 6cd3502f6398456b326b58d47d5cbc421fbd2905 | Uday Bondhugula    | bondhugula@google.com | 2018-08-29 01:24:27 | jpienaar       | jpienaar@google.com | 2019-03-29 20:07:30 | Introduce loop unroll jam transformation.

- for test purposes, the unroll-jam pass unroll jams the first outermost loop.

While on this:
- fix StmtVisitor to allow overriding of function to iterate walk over children
  of a stmt.

PiperOrigin-RevId: 210644813
 | 5cb37034587b1485478a2f9fa0f0cf359d5b89db | ["c6aa35b99c9e885f8482990129d4029efee8b199"] |
| github.com/tensorflow/tensorflow | 6cd3502f6398456b326b58d47d5cbc421fbd2905 | Uday Bondhugula    | bondhugula@google.com | 2018-08-29 01:24:27 | jpienaar       | jpienaar@google.com | 2019-03-29 20:07:30 | Introduce loop unroll jam transformation.

- for test purposes, the unroll-jam pass unroll jams the first outermost loop.

While on this:
- fix StmtVisitor to allow overriding of function to iterate walk over children
  of a stmt.

PiperOrigin-RevId: 210644813
 | 5cb37034587b1485478a2f9fa0f0cf359d5b89db | ["c6aa35b99c9e885f8482990129d4029efee8b199"] |
| github.com/tensorflow/tensorflow | 932fcbbd3836022a862d2479d716fc9c7563ff47 | nrstott            | nrstott@gmail.com     | 2018-06-11 14:11:35 | nrstott        | nrstott@gmail.com   | 2018-06-11 14:11:35 | check that target_column is correct type
                                                                                                                                                                                                                             | ecd24348e2061a1ccce6f25b406799e04330a29b | ["19b77a282b1ade7788ae394f22ac0bd7b0a2ce76"] |
+----------------------------------+------------------------------------------+--------------------+-----------------------+---------------------+----------------+---------------------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------+----------------------------------------------+
3 rows in set (0.21 sec)
MySQL [gitbase]> select count(*) from commit_files where repository_id = 'github.com/tensorflow/tensorflow' and commit_hash = '6cd3502f6398456b326b58d47d5cbc421fbd2905';
+----------+
| COUNT(*) |
+----------+
|       84 |
+----------+
1 row in set (2.31 sec)
MySQL [gitbase]> select count(*) from commit_files where repository_id = 'github.com/tensorflow/tensorflow' and commit_hash = '932fcbbd3836022a862d2479d716fc9c7563ff47';
+----------+
| COUNT(*) |
+----------+
|    11223 |
+----------+
1 row in set (14.37 sec)
erizocosmico commented 4 years ago

As discussed IRL, working as expected, the repo just has multiple roots.