Open nicolas-geysse opened 5 months ago
A more (?) related way to attributes and types: "Extension of Data Sources" point, focusing on integrating geospatial and temporal data while preserving attributes and types:
Extension of Data Sources • Addition: Data import with attribute and type preservation • Addition: Support for geospatial and temporal data • Libraries: qwikidata, geopandas, pandas • Benefits: Enrichment of graphs with structured and geotemporal data
Implementation:
from txtai.graph import Graph
import geopandas as gpd
import pandas as pd
from qwikidata.linked_data_interface import get_entity_dict_from_api
class EnhancedGraph(Graph):
def __init__(self):
super().__init__()
self.gdf = gpd.GeoDataFrame()
self.temporal_data = pd.DataFrame()
def add_geospatial_node(self, node_id, geometry, **attrs):
self.graph.add_node(node_id, geometry=geometry, **attrs)
self.gdf = self.gdf.append({'node_id': node_id, 'geometry': geometry, **attrs}, ignore_index=True)
def add_temporal_node(self, node_id, timestamp, **attrs):
self.graph.add_node(node_id, timestamp=timestamp, **attrs)
self.temporal_data = self.temporal_data.append({'node_id': node_id, 'timestamp': timestamp, **attrs}, ignore_index=True)
def import_wikidata(self, entity_id):
entity_dict = get_entity_dict_from_api(entity_id)
node_id = entity_dict['id']
attrs = {claim['mainsnak']['property']: claim['mainsnak']['datavalue']['value']
for claim in entity_dict['claims'] if 'datavalue' in claim['mainsnak']}
self.graph.add_node(node_id, **attrs)
return node_id
def to_geopandas(self):
return self.gdf
def to_temporal_pandas(self):
return self.temporal_data
def import_geojson(self, file_path):
gdf = gpd.read_file(file_path)
for idx, row in gdf.iterrows():
self.add_geospatial_node(idx, row.geometry, **row.to_dict())
def import_temporal_csv(self, file_path, timestamp_col, node_id_col):
df = pd.read_csv(file_path, parse_dates=[timestamp_col])
for idx, row in df.iterrows():
self.add_temporal_node(row[node_id_col], row[timestamp_col], **row.to_dict())
def spatial_query(self, geometry):
return self.gdf[self.gdf.intersects(geometry)]
def temporal_query(self, start_time, end_time):
mask = (self.temporal_data['timestamp'] >= start_time) & (self.temporal_data['timestamp'] <= end_time)
return self.temporal_data.loc[mask]
graph = EnhancedGraph()
# Import geospatial data
graph.import_geojson("cities.geojson")
# Import temporal data
graph.import_temporal_csv("events.csv", timestamp_col="event_date", node_id_col="event_id")
# Import Wikidata
node_id = graph.import_wikidata("Q64")
# Perform spatial and temporal queries
cities_in_area = graph.spatial_query(some_polygon)
events_in_timeframe = graph.temporal_query(pd.Timestamp("2023-01-01"), pd.Timestamp("2023-12-31"))
# Convert to GeoDataFrame or DataFrame for further analysis
gdf = graph.to_geopandas()
temporal_df = graph.to_temporal_pandas()
This implementation enhances TxtAI's graph capabilities by:
Regarding the initial type problem: This implementation indirectly addresses the initial type problem by providing a more robust framework for handling different types of data, including the ability to preserve and query based on node types and attributes. While it doesn't directly solve the specific issue of adding a 'type' attribute to nodes, it provides a flexible structure that can easily accommodate such attributes and more complex data types.
The approach is well-integrated with TxtAI's ecosystem, extending its Graph class and using compatible libraries like geopandas and pandas. It also leverages NetworkX's underlying graph structure while adding geospatial and temporal capabilities on top of it.
Citations: [1] https://github.com/neuml/txtai/blob/master/examples/38_Introducing_the_Semantic_Graph.ipynb [2] https://github.com/neuml/txtai/blob/master/examples/57_Build_knowledge_graphs_with_LLM_driven_entity_extraction.ipynb [3] https://neuml.hashnode.dev/generate-knowledge-with-semantic-graphs-and-rag [4] https://neuml.hashnode.dev/introducing-the-semantic-graph [5] https://neuml.github.io/txtai/examples/ [6] https://networkx.org/documentation/stable/auto_examples/geospatial/index.html [7] https://networkx.org/documentation/stable/auto_examples/geospatial/extended_description.html [8] https://github.com/geopandas/geopandas/issues/1592 [9] https://napo.github.io/geospatial_course_unitn/lessons/05-street-network-analysis [10] https://towardsdatascience.com/3d-spatial-data-integration-with-python-7ef8ef14589a?gi=568600818a62 [11] https://pandas.pydata.org/docs/getting_started/intro_tutorials/09_timeseries.html [12] https://pandas.pydata.org/docs/user_guide/timeseries.html [13] https://pythongis.org/part1/chapter-03/nb/03-temporal-data.html [14] https://pandas.pydata.org/pandas-docs/version/1.2.0/getting_started/intro_tutorials/09_timeseries.html [15] https://www.dataquest.io/blog/tutorial-time-series-analysis-with-pandas/
Here's a plan to enhance TxtAI with geospatial and temporal search capabilities:
1. Extend indexing for geospatial data:
2. Implement temporal search functionalities:
3. Integrate with existing semantic search:
This implementation:
To use this enhanced graph:
This approach extends TxtAI's capabilities while maintaining simplicity and integration with its existing ecosystem.
Citations: [1] https://networkx.org/documentation/stable/auto_examples/geospatial/index.html [2] https://networkx.org/documentation/stable/auto_examples/geospatial/extended_description.html [3] https://github.com/geopandas/geopandas/issues/1592 [4] https://napo.github.io/geospatial_course_unitn/lessons/05-street-network-analysis [5] https://pypi.org/project/networkx-temporal/ [6] https://www.timescale.com/blog/tools-for-working-with-time-series-analysis-in-python/ [7] https://pythongis.org/part1/chapter-03/nb/03-temporal-data.html [8] https://github.com/MaxBenChrist/awesome_time_series_in_python [9] https://unit8co.github.io/darts/ [10] https://www.timescale.com/blog/how-to-work-with-time-series-in-python/ [11] https://github.com/sacridini/Awesome-Geospatial [12] https://www.mdpi.com/1999-4893/10/2/37