Can we retrieve the content of save() and load() in memory without / instead of passing through a file?

alan23273850 commented 4 years ago

Is your feature request related to a problem? Please describe.

I currently want to measure coverage of my function in a child process through API, and send the coverage info (especially missing lines from .analysis(...) method) to the parent process via pipes (i.e., in memory).

There are two difficulties: (1) Although objects of class CoverageData can be sent by pipes, it lacks of "missing lines" information. (2) Although objects of class Coverage contain "missing lines" information, they cannot be sent by pipes.

I can currently only save the object's data to a file in the child process with .save() method, and then read data from that file in the parent process with .load() method. This causes very frequent I/O if my experiments are performed many times.

Describe the solution you'd like (self is my custom class)

If CoverageData contains all attributes returned by the method coverage.analysis(...) and can be sent by pipes, my goals are done! More accurately, I hope this command self.coverage_data.update(self.coverage.get_data()) can really update all the information returned by self.coverage.analysis(file).

Describe alternatives you've considered (self is my custom class)

Or a simpler way is to simply retrieve the content of self.coverage.save() in memory. This must be of type "str." So I can send it via pipes to the parent process. In addition, I hope self.coverage.load() can accept a string argument, so the content mentioned above can be updated into self.coverage without file I/O.

Additional context Thank you very much!!!!!

nedbat commented 4 years ago

CoverageData has dumps and loads methods for serializing and de-serializing. Do those not work to transmit the collected data to the parent process?

alan23273850 commented 4 years ago

Hello! Thank you for your response!! Actually I'm using the version 4.5.4 so not aware of these two methods. However even if in the latest version 5.2.1, the 1st paragraph in the official website says that "It does not include information from the analysis phase, to determine what lines could have been executed, or what lines were not executed."

My final purpose is to collect the accumulated data (i.e., after many runs of .start() and .stop()) of one particular file: _, executable_lines, missing_lines, _ = self.coverage.analysis(file) in the child process and send these two accumulated attributes to my parent process. Since the class CoverageData does not store these two attributes of each file, dumps and loads are still not able to solve my problem...

There are many simple in-memory workarounds so this feature request is not urgent for me, but I think it is a good feature to be added in future versions!

devdanzin commented 3 months ago

Maybe we could add a method to Coverage that adds the missing attributes to CoverageData and only output the values of these new attributes on dumps and support them in loads?

Something like (completely untested):

diff --git a/coverage/control.py b/coverage/control.py
index 4e1d359e..0b9876c1 100644
--- a/coverage/control.py
+++ b/coverage/control.py
@@ -868,6 +868,20 @@ class Coverage(TConfigurable):
         assert self._data is not None
         return self._data

+    def enrich_data(self, morf: TMorf) -> None:
+        """Add analysis information to the CoverageData instance in self._data."""
+        if not self._data:
+            self.get_data()
+        assert self._data is not None
+        f, s, e, m, mf = self.analysis2(morf)
+        self._data.analysis_data = dict(
+            filename=f,
+            statements=s,
+            excluded=e,
+            missing=m,
+            missing_formatted=mf,
+        )
+
     def _post_save_work(self) -> None:
         """After saving data, look for warnings, post-work, etc.

diff --git a/coverage/sqldata.py b/coverage/sqldata.py
index e739c39c..d43a6b6d 100644
--- a/coverage/sqldata.py
+++ b/coverage/sqldata.py
@@ -10,6 +10,7 @@ import datetime
 import functools
 import glob
 import itertools
+import json
 import os
 import random
 import socket
@@ -255,6 +256,7 @@ class CoverageData:
         self._current_context: str | None = None
         self._current_context_id: int | None = None
         self._query_context_ids: list[int] | None = None
+        self.analysis_data = {}

     __repr__ = auto_repr

@@ -373,8 +375,9 @@ class CoverageData:
         if self._debug.should("dataio"):
             self._debug.write(f"Dumping data from data file {self._filename!r}")
         with self._connect() as con:
-            script = con.dump()
-            return b"z" + zlib.compress(script.encode("utf-8"))
+            script = con.dump().encode("utf-8")
+            analysis_data = json.dumps(self.analysis_data).encode("utf-8")
+            return b"z" + zlib.compress(analysis_data + b"|ANALYSIS_DATA_MARKER|" + script)

     def loads(self, data: bytes) -> None:
         """Deserialize data from :meth:`dumps`.
@@ -397,7 +400,8 @@ class CoverageData:
             raise DataError(
                 f"Unrecognized serialization: {data[:40]!r} (head of {len(data)} bytes)",
             )
-        script = zlib.decompress(data[1:]).decode("utf-8")
+        analysis_data, script = zlib.decompress(data[1:]).decode("utf-8").split("|ANALYSIS_DATA_MARKER|")
+        self.analysis_data = json.loads(analysis_data)
         self._dbs[threading.get_ident()] = db = SqliteDb(self._filename, self._debug)
         with db:
             db.executescript(script)

The only visible change would be to the output and input of these two methods, which are documented to be free to change from version to version:

   The format of the serialized data is not documented. It is only
   suitable for use with :meth:`loads` in the same version of
   coverage.py.

nedbat / coveragepy

Can we retrieve the content of save() and load() in memory without / instead of passing through a file? #1016