Open ghost opened 8 years ago
That looks fine, except perhaps the last folder... I personally think
results/Intel-i7/05-15-14-34-22/Integer_Boxed_MULTIPLY_04.csv
might be better, because I can't think of a situation where you wouldn't load all of the types at once (i.e. results/Intel-i7/05-15-14-34-22/*.csv
). But that's a matter of taste, it's up to you since you will do most of the graphing.
So a double redundancy with the entire folder hierachy "encoding" all of the information, PLUS the filename encoding all of the information?
Sure I can do it like that.
The only thing I consider truly important is that all of the information is encoded within the csv file. This exactly why I have some of the additional columns that might appear to be slightly redundant. But it makes sure that the tables are conforming to the rules of relational algebra, so each row is a composite primary key that is unique in all of the potentially millions of rows that are to be generated.
It also makes processing and "grouping" the data easier in R. With these additional columns it's possible to do in 1-2 lines what would take God knows how many lines, perhaps 100 lines in Python or Matlab.
Please see tutanota e-mail regarding Friday.
I'm reading the Scalastyle repository source code as it's relevant to the CLI parsing and it's a well written project. https://github.com/scalastyle/scalastyle
Do you know of any other small Scala projects that I can read as a beginner?
I have said it before, to make sure it's explicit. I know most of the "components" of Scala. But when it comes to putting it all together I'm really having a hard time. This is only transient and as I continue writing more code it will disappear, it's just like when I started with R and later Java. But in the mean time, having good repositories to read can be a TRULY invaluable blessing. So if you know of any reasonable sized projects besides Scalastyle I can check out, please let me know!
@ktakagaki
Haven't done much reading myself, but these might be to-the-point and not too daunting: https://github.com/garyKeorkunian/squants https://github.com/non/spire
@ktakagaki I recently learned how to "easily" call Git from inside scala using the System ProcessBuilder in Scala.
Here is a small example that prints current git HEAD sha1 hash when run (assuming Git is installed on the host, and added properly to the systems environment variable as it always should be)
package de.lin.hayabaya.playground
import sys.process._
import scala.language.postfixOps
object SystemCall {
def printLines(): String = ("ls -al" !!).toString
def printHash(): String = {
val res = ("git rev-parse HEAD" !!).toString
res
}
def main(args: Array[String]): Unit = {
println("Hello")
val hash = SystemCall.printHash()
println("The git hash is: " + hash)
}
}
The resulting output is simply
Hello
The git hash is: 7984a34f61eedfbc6bc0c70e7346fb0de72ed053
Therefore, based on this approach. I suggest we change the previously drafted hierachy for file output and simplify it and flatten it, exploiting git sha1 hashes.
So we go back to just outputting files into a /results folder. And each time hayabaya is run it run roughly like this
So all of the different operations, datatypes etc. are stored in 1 csv file plus a new column containing the 5 char string for creating a unique group/run ID later on when loading multiple csv files.
We discussed reordering the structure of the results folder when saving the results from Hayabaya. The intention was to enable running the experiment multiple times without risking any files being overwritten.
What organization do you prefer of the tree hierachy?
Suggestions: ( "/" denotes a new level of folders, "<", ">" used for meta-naming)