Closed djnakabaale closed 4 years ago
@dmj2x thanks for the detailed issue.
Regrading your Cypher script, I find this bit suspicious:
# convert startDate to nanoseconds
let toNano=$col2*1000000
# add add duration to startDate
let endDate=$(expr $toNano + $col3)
For Interactive Q4, col3
is specified in days, so in order to add it to a nanosecond value, you would first need to convert it to nanoseconds by multiplying it by 864000000000.
Regarding representing dates in Neo4j, there are three key approaches:
dev
branch.yyyyMMddhhmmss000
(the last triple zeros representing milliseconds). This is used in the stable
branch. The conversion from epochs to such values is handled by the CypherConverter
class and I also have some Bash scripts for convenience.I'll look into the Postgres code later.
Hi @szarnyasg,
Regrading your Cypher script, I find this bit suspicious: For Interactive Q4,
col3
is specified in days, so in order to add it to a nanosecond value, you would first need to convert it to nanoseconds by multiplying it by 864000000000.
I have updated my script to convert days to nanoseconds first,
let toNano=$col2*1000000
let daysToNano=$col3*864000000000
# add add duration to startDate
let endDate=$(( $toNano + $daysToNano ))
{ echo ":param personId=>$col1"; echo ":param startDate=>$col2"; echo ":param endDate=>$endDate"; cat ./queries/interactive-complex'-'$1.cypher ; } | neo4j-server/bin/cypher-shell --format verbose -u neo4j -p admin
For some reason, the query results are the same even after I modified the script.
Also, the column creationdate to which startDate is compared, is in nanoseconds however when I convert startDate to nanoseconds first, I get no query results as opposed to using startDate date unchanged.
let startDate=$col2*1000000
let daysToNano=$col3*864000000000
let endDate=$(( $startDate + $daysToNano ))
{ echo ":param personId=>$col1"; echo ":param startDate=>$startDate"; echo ":param endDate=>$endDate"; cat ./queries/interactive-complex-4.cypher ; } | neo4j-server/bin/cypher-shell --format verbose -u neo4j -p admin
I'll look into the Postgres code later.
Thank you for taking time to look into this.
How do you load the data to Neo4j? In the generated files, they are not in epoch but using the String formatter:
yyyy-MM-dd'T'HH:mm:ss.SSSZ
(This is an almost RFC 3339-compliant format. A fully compliant String formatter would use e.g. yyyy-MM-dd'T'HH:mm:ss.SSS+00:00
.)
How do you load the data to Neo4j? In the generated files, they are not in epoch but using the String formatter:
I use neo4j-import that's available in neo4j 3.5
To match the format of the timestamps created by the loader, you'll need a slightly more complicated Bash script:
EPOCH_MILLI=$col2
EPOCH_SEC=$(($EPOCH_MILLI / 1000))
export TZ=GMT
TIMESTAMP=`date +%Y%m%d%H%M%S000 --date @$EPOCH_SEC`
To match the format of the timestamps created by the loader, you'll need a slightly more complicated Bash script:
TIMESTAMP=`date +%Y%m%d%H%M%S000 --date @$EPOCH_SEC`
I'm working on mac with bash version _GNU bash, version 3.2.57(1)-release (x8664-apple-darwin19) and it seems not to like the format, for example running
EPOCH_MILLI=1335830400000
EPOCH_SEC=$(($EPOCH_MILLI / 1000))
export TZ=GMT
TIMESTAMP=`date +%Y%m%d%H%M%S000 --date @$EPOCH_SEC`
results into
date: illegal time format
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
[-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
Interesting, what's the output for you when running this?
date --version
For me it's
date (GNU coreutils) 8.32
Interesting, what's the output for you when running this?
date --version
Interesting indeed, on mac
macOS Catalina version 10.15.7
the output is
date --version
date: illegal option -- - usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ... [-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
I didn't have gdate so installed version date (GNU coreutils) 8.32 with homebrew script seems to work fine and generates a time stamp.
function epoch () {
EPOCH_MILLI=$1
EPOCH_SEC=$(($EPOCH_MILLI / 1000))
export TZ=GMT
TIMESTAMP=`gdate +%Y%m%d%H%M%S000 --date @$EPOCH_SEC`
echo $TIMESTAMP
}
Also added another function to get the final end date
function epoch_end_date () {
START_DATE=$1
DAYS=$2
DAYS_TO_MS=$(($DAYS * 86400000))
EPOCH_MILLI=$(($START_DATE + $DAYS_TO_MS))
EPOCH_SEC=$(($EPOCH_MILLI / 1000))
export TZ=GMT
TIMESTAMP=`gdate +%Y%m%d%H%M%S000 --date @$EPOCH_SEC`
echo $TIMESTAMP
}
All seems to work fine and I get results on some substitution params but not all. I would like to inquire on how I can verify the query results from by queries 3 & 4.
I've tried it for one set of params per query. I had to add an echo ";";
do the script to get it working. This approach returns results for both queries.
$ { echo ":param personId=>21990232559429"; echo ":param startDate=>20120501000000000"; echo ":param endDate=>20120607000000000"; cat ./queries/interactive-complex-4.cypher; echo ";"; } | neo4j-server/bin/cypher-shell --format verbose -u neo4j -p admin
+------------------------------------------------+
| tagName | postCount |
+------------------------------------------------+
| "Hassan_II_of_Morocco" | 2 |
| "Appeal_to_Reason" | 1 |
| "Principality_of_Littoral_Croatia" | 1 |
| "Rivers_of_Babylon" | 1 |
| "Van_Morrison" | 1 |
+------------------------------------------------+
{ echo ":param personId=>21990232559429"; echo ":param startDate=>20120501000000000"; echo ":param endDate=>20120607000000000"; echo ":param countryXName: 'Belgium'"; echo ":param countryYName: 'Netherlands'"; cat ./queries/interactive-complex-3.cypher; echo ";"; } | neo4j-server/bin/cypher-shell --format verbose -u neo4j -p admin
+-----------------------------------------------------------------------------+
| personId | personFirstName | personLastName | xCount | yCount | count |
+-----------------------------------------------------------------------------+
| 8796093028172 | "Karan" | "Khan" | 1 | 1 | 2 |
| 13194139537010 | "Rahul" | "Rao" | 1 | 1 | 2 |
+-----------------------------------------------------------------------------+
I'm polishing up the script and looking into how it performs for other params.
Here's a script that works for me. Note that this only checks the first 4 lines of the params txt files (i.e. the first 3 parameter combinations) and uses an absolute path in my file system:
function epoch () {
EPOCH_MILLI=$1
EPOCH_SEC=$(($EPOCH_MILLI / 1000))
export TZ=GMT
TIMESTAMP=`date +%Y%m%d%H%M%S000 --date @$EPOCH_SEC`
echo $TIMESTAMP
}
function epoch_end_date () {
START_DATE=$1
DAYS=$2
DAYS_TO_MS=$(($DAYS * 86400000))
EPOCH_MILLI=$(($START_DATE + $DAYS_TO_MS))
EPOCH_SEC=$(($EPOCH_MILLI / 1000))
export TZ=GMT
TIMESTAMP=`date +%Y%m%d%H%M%S000 --date @$EPOCH_SEC`
echo $TIMESTAMP
}
{
read
while IFS='|' read -r col1 col2 col3 col4 col5
do
echo $col1 $col2 $col3 $col4 $col5
let startDate=`epoch $col2`
let endDate=`epoch_end_date $col2 $col3`
{ echo ":param personId=>$col1"; echo ":param startDate=>$startDate"; echo ":param endDate=>$endDate"; echo ":param countryXName: '$col4'"; echo ":param countryYName: '$col5'"; cat ./queries/interactive-complex-3.cypher; echo ";"; } | neo4j-server/bin/cypher-shell --format verbose -u neo4j -p admin
((i++))
done
} < <(head -n 4 ~/git/snb/ldbc_snb_datagen/substitution_parameters-sf1/interactive_3_param.txt)
{
read
while IFS='|' read -r col1 col2 col3
do
echo $col1 $col2 $col3
let startDate=`epoch $col2`
let endDate=`epoch_end_date $col2 $col3`
{ echo ":param personId=>$col1"; echo ":param startDate=>$startDate"; echo ":param endDate=>$endDate"; cat ./queries/interactive-complex-4.cypher; echo ";"; } | neo4j-server/bin/cypher-shell --format verbose -u neo4j -p admin
((i++))
done
} < <(head -n 4 ~/git/snb/ldbc_snb_datagen/substitution_parameters-sf1/interactive_4_param.txt)
The outputs of these commmands match with the relevant lines of validation_params.csv
:
["com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcQuery3",17592186055119,"Laos","Scotland",1306886400000,42,20]|[[8796093029689,"Eun-Hye","Yoon",1,1,2]]
["com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcQuery3",8796093030404,"Uruguay","Scotland",1298937600000,28,20]|[]
["com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcQuery3",2199023266354,"Scotland","Slovakia",1301616000000,30,20]|[]
["com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcQuery4",21990232559429,1335830400000,37,10]|[["Hassan_II_of_Morocco",2],["Appeal_to_Reason",1],["Principality_of_Littoral_Croatia",1],["Rivers_of_Babylon",1],["Van_Morrison",1]]
["com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcQuery4",24189255814337,1343779200000,29,10]|[["Ehud_Olmert",3],["Be-Bop-A-Lula",1],["Kingdom_of_Sardinia",1],["The_Singles:_The_First_Ten_Years",1]]
["com.ldbc.driver.workloads.ldbc.snb.interactive.LdbcQuery4",2199023261318,1343779200000,29,10]|[["Ehud_Olmert",3],["Be-Bop-A-Lula",1],["Kingdom_of_Sardinia",1],["The_Singles:_The_First_Ten_Years",1]]
I've tried it for one set of params per query.
$ { echo ":param personId=>21990232559429"; echo ":param startDate=>20120501000000000"; echo ":param endDate=>20120607000000000"; cat ./queries/interactive-complex-4.cypher; echo ";"; } | neo4j-server/bin/cypher-shell --format verbose -u neo4j -p admin +------------------------------------------------+ | tagName | postCount | +------------------------------------------------+ | "Hassan_II_of_Morocco" | 2 | | "Appeal_to_Reason" | 1 | | "Principality_of_Littoral_Croatia" | 1 | | "Rivers_of_Babylon" | 1 | | "Van_Morrison" | 1 | +------------------------------------------------+
{ echo ":param personId=>21990232559429"; echo ":param startDate=>20120501000000000"; echo ":param endDate=>20120607000000000"; echo ":param countryXName: 'Belgium'"; echo ":param countryYName: 'Netherlands'"; cat ./queries/interactive-complex-3.cypher; echo ";"; } | neo4j-server/bin/cypher-shell --format verbose -u neo4j -p admin +-----------------------------------------------------------------------------+ | personId | personFirstName | personLastName | xCount | yCount | count | +-----------------------------------------------------------------------------+ | 8796093028172 | "Karan" | "Khan" | 1 | 1 | 2 | | 13194139537010 | "Rahul" | "Rao" | 1 | 1 | 2 | +-----------------------------------------------------------------------------+
Great, thanks again @szarnyasg. I have similar values when I use the parameters above. I also seem to be getting consistent results for queries 3 and 4 on scale factor one, for both cypher an postgres. I am doing more experiments with scale factors 10 and 30, and I hope to check-in back here as I get results.
Great, I hope the queries work on larger SFs as well. Feel free to reopen this issue if they don't work.
This question is in reference to an earlier question/ issue #135 and I would like to follow it up with a another question.
I am experimenting with both cypher and Postgres queries on SF1 dataset, running directly in the respective servers. All queries seem to run fine and I get similar results, except for queries 3 & 4. This appears to be an issue with the endDate variable that I hope to get some guidance on, as I may be missing something. Here are some of the details in the scripts I run for query 4.
For Cypher:
Both these queries have an endDate variable for which I have to convert startDate to nano seconds before adding durationDays to it.
For Postgres:
In postgres I initially got this error
ERROR: invalid input syntax for type interval
, and I made a minor modification to the condition in the where clause of the nested select statement, adding timestamp to it;m_creationdate >= :startDate and m_creationdate < ((:startDate::timestamp) + INTERVAL '1 days' * :durationDays)
.The results from both postgres and cypher are very different. I'll be happy to share some of the query results if that helps.