marklister / product-collections

A very simple, strongly typed, scala framework for tabular data. A collection of tuples. A strongly typed scala csv reader and writer. A lightweight idiomatic dataframe / datatable alternative.
BSD 2-Clause "Simplified" License
144 stars 19 forks source link

Not thread-safe #33

Closed irundaia closed 9 years ago

irundaia commented 9 years ago

I've been trying to run some unit tests on my parsers. To speed things up, I wanted to run all tests in parallel. However, if I do that, my tests become unreliable (i.e. failing inconsistently).

marklister commented 9 years ago

Iterators are not thread safe as far as i know. Can you post some code here (as simple as possible) so i can see if i have some suggestions?

marklister commented 9 years ago

A CollSeq, on the other hand, should be thread safe.

irundaia commented 9 years ago

To be honest, I'm guessing I'm using iterators. How would I go about changing this to a CollSeq?

To add a short example: I've got a simple cvs file which I'm parsing using (assuming I've got an implicit transformation from a Product6 -> Tuple6 and a DateConverter):

class MyParser extends CSVParserTrait {
  override protected def parse(inputReader: Reader): Seq[Try[Transaction]] =
    CsvParser[String, Date, String, String, String, String].parse(inputReader, ";").map(p => convertToTransaction(p))

  private def convertToTransaction = (convert _).tupled
  private def convert(
      ownAccount: String,
      date: Date,
      t: String,
      amount: String,
      otherAccount: String,
      description: String): Try[Transaction] =
    Try {
      Transaction(
        amount.replaceAll(",", "").toInt,
        new Timestamp(date.getTime),
        if (t.equalsIgnoreCase("bij")) otherAccount else ownAccount,
        if (!t.equalsIgnoreCase("bij")) otherAccount else ownAccount,
        description,
        None
      )
    }
}

I would then run three tests such as:

class MyParserSpec extends Specification {

  val correctIncoming = "\"3753-882237\";\"21-11-2008\";\"BIJ\";2291,34;RABOIBAN;\"American Express \""
  val correctOutgoing = "\"3753-882237\";\"20-11-2008\";\"AF\";86,40;RABOIBAN;\"DROGISTERIJ.NET, STELLENDAM\""
  val multipleInput = Seq(correctIncoming, correctOutgoing).mkString("\n")

  "MyParser" >> {
    "be able to parse an incoming transaction" >> {
      val transactions = (new MyParser()).parse(correctIncoming)

      val transaction = transactions.head.get
      transaction.amount must beEqualTo(229134)
      transaction.category must beNone
      transaction.description must beEqualTo("American Express ")
      transaction.fromAccount must beEqualTo("RABOIBAN")
      transaction.toAccount must beEqualTo("3753-882237")
      transaction.date.compareTo(new Timestamp(INGParser.dateParser.parse("21-11-2008").getTime)) must beEqualTo(0)
    }

    "be able to parse an outgoing transaction" >> {
      val transactions = (new MyParser()).parse(correctOutgoing)

      val transaction = transactions.head.get
      transaction.amount must beEqualTo(8640)
      transaction.category must beNone
      transaction.description must beEqualTo("DROGISTERIJ.NET, STELLENDAM")
      transaction.fromAccount must beEqualTo("3753-882237")
      transaction.toAccount must beEqualTo("RABOIBAN")
      transaction.date.compareTo(new Timestamp(INGParser.dateParser.parse("20-11-2008").getTime)) must beEqualTo(0)
    }

    "be able to parse multiple transactions in a single input file/stream" >> {
      val transactions = (new MyParser()).parse(multipleInput)
      transactions.count(_.isSuccess) must beEqualTo(2)
    }

  }

}

Running these tests in parallel, will make then succeed and fail unpredictably.

marklister commented 9 years ago

http://stackoverflow.com/questions/6840803/simpledateformat-thread-safety

Can you try omitting the date field? See what happens?

marklister commented 9 years ago

By the way you are using a CollSeq -- CollSeq6 to be exact. I should make it easier to produce a Try[_]

irundaia commented 9 years ago

I'm sorry to have bothered you with this. It hadn't occurred to me that SimpleDateFormat isn't thread safe... -.-

I've changed my implementation of the DateConverter to use the (thread safe) FastDateFormat from apache commons lang. Now, I haven't been able to get any inconsistent test results.

Thank you for your support!

marklister commented 9 years ago

It's no bother and it hadn't occurred to me to check sdf's thread safety. Thanks for reporting this. I dropped SimpleDateFormat simply to align the scala-jvm and scala-js versions. Your bug report is still useful to anyone else trying to achieve the same thing. And I'll make a note in the docs about thread safety and sdf.