Closed arenas-guerrero-julian closed 3 years ago
This is the origin of the difference:
But I can't see how the 0 become NULL; it should be 0 and only NULL if exact_times
is an empty value... maybe some problem importing the CSV file to MySQL? Needs further investigation.
The problem is in the input dataset, the value of exact_time is always NULL, bc it's optional in the GTFS model and our dataset does not provide that information. What I don't understand is why the output contains 0...
The problem is in the input dataset, the value of exact_time is always NULL, bc it's optional in the GTFS model and our dataset does not provide that information. What I don't understand is why the output contains 0...
The data that VIG uses have NULL values:
DROP TABLE IF EXISTS `FREQUENCIES`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `FREQUENCIES` (
`trip_id` varchar(200) NOT NULL,
`start_time` varchar(200) NOT NULL,
`end_time` varchar(200) DEFAULT NULL,
`headway_secs` int(11) DEFAULT NULL,
`exact_times` int(11) DEFAULT '0',
PRIMARY KEY (`trip_id`,`start_time`),
CONSTRAINT `FREQUENCIES_ibfk_1` FOREIGN KEY (`trip_id`) REFERENCES `TRIPS` (`trip_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Dumping data for table `FREQUENCIES`
--
LOCK TABLES `FREQUENCIES` WRITE;
/*!40000 ALTER TABLE `FREQUENCIES` DISABLE KEYS */;
INSERT INTO `FREQUENCIES` VALUES ('4_I12-001_2016I12_1_1_4__1___','00:00:00','01:00:00',600,NULL),('4_I12-001_2016I12_1_1_4__1___','01:00:00','02:00:00',900,NULL),('4_I12-001_2016I12_1_1_4__1___','06:05:00','07:00:00',540,NULL)
[...]
And confirmed:
mysql> select * from FREQUENCIES LIMIT 10;
+-------------------------------+------------+----------+--------------+-------------+
| trip_id | start_time | end_time | headway_secs | exact_times |
+-------------------------------+------------+----------+--------------+-------------+
| 4_I12-001_2016I12_1_1_4__1___ | 00:00:00 | 01:00:00 | 600 | NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 01:00:00 | 02:00:00 | 900 | NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 06:05:00 | 07:00:00 | 540 | NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 07:00:00 | 09:00:00 | 480 | NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 09:00:00 | 18:00:00 | 480 | NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 18:00:00 | 21:30:00 | 360 | NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 21:30:00 | 23:00:00 | 390 | NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 23:00:00 | 24:00:00 | 450 | NULL |
| 4_I12-001_2016I12_2_1_4__1___ | 00:00:00 | 01:00:00 | 600 | NULL |
| 4_I12-001_2016I12_2_1_4__1___ | 01:00:00 | 02:00:00 | 900 | NULL |
+-------------------------------+------------+----------+--------------+-------------+
10 rows in set (0.00 sec)
Maybe a VIG bug?
I'll make some tests locally
seems that the problem is here: SET exact_times = IF(exact_times='',NULL,exact_times);
Possible problems on SET statements while importing data, further investigation needed.
Relevant:
LOAD DATA interprets an empty field value differently from a missing field:
https://dev.mysql.com/doc/refman/8.0/en/load-data.html https://dev.mysql.com/doc/refman/8.0/en/datetime.html
if there are other issues regarding set in the loading but is not in exact_times (as it has been solved) please @daniel-dona open other issues
Column
exact_times
in CSVFREQUENCIES.csv
have0
values while columnexact_times
for tableFREQUENCIES
in RDB haveNULL
values.This causes some engines to provide different number of triples, as
NULL
values for references are to be ignored (according to R2RML spec) when generating triples.