lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.91k stars 416 forks source link

Need help with terminals not showing up as expected #1399

Closed justrajdeep closed 2 months ago

justrajdeep commented 7 months ago

Hi

https://github.com/lark-parser/lark/issues/1398

Building on the previous question

I completed the full parser

text = '''7212775000:H:uvm_test_top:[slc_base_test/cex3_debug] socket:0, cex3 txn : t: (cpu_dv_scc_cex3_uvc::transaction@8320904) { response_expected: 0x0  txn_phase: TXN_PHASE_UNINITIALIZED  accept_time: 7212775000  begin_time: 7212775000  end_time: 7212775000  begin_cycle: 3288  end_cycle: 3288  pd: { tag: PKT_dv_scc_cex3_dv_scc_client_req_cex3_signals  snoop_c2chub_id_eviction: 0x0  snoop_c2chub_id_hit: 0x0  ctag_gv_next: 0x0  ctag_gv_current: 0x0  is_la: 0x0  is_ephemeral: 0x0  cf_user_field: 0x732090cb  mecid: 0x0  trace_tag: 0x1  vc: ucf_slc_chi_reqsnp_intf_vc_t_NON_BLOCKING_LL  mpam: 0x151  exp_comp_ack: 0x0  excl: 0x0  stash_group_id: 0x9f  snp_attr: 0x1  mem_attr: 0xd  pcrd_type: 0x0  order: ucf_slc_chi_req_order_t_CHI_REQ_ORDER_NO_ORDER  allow_retry: 0x1  likely_shared: 0x0  size: ucf_slc_chi_req_size_t_CHI_SIZE_64B  opcode: ucf_slc_chi_req_opcode_internal_t_CHI_REQ_STASHONCESHARED  return_txn_id: 0x2  stash_nid_valid: 0x0  return_nid: 0x5f3  retryid: 0x0  ldid: 0x3a  stash_nid: 0x73  slcrephint: 0x73  reqsrcattr: 0x2  decerr: 0x0  scc_drop_rsp: 0x0  ncr: 0x0  clean: 0x1  return_state: ucf_slc_chi_req_return_state_t_S  addr: 0x1000fe8c82a  ns: 0x0  nse: 0x0  txn_id: 0x69d  tgt_id: 0xcf  src_id: 0x9a  qos: 0x3  is_volatile: 0x0  tgt_ldid: 0xa0a  ctag_hit_mecid: 0x3e5  bypass_valid: 0x0  owt_atomic_have_ownership_no_allocation_executed: 0x0  owt_atomic_have_ownership_no_allocation_yet_to_execute: 0x0  owt_atomic_have_ownership_allocating: 0x0  owt_iocrd_have_ownership_allocating: 0x0  owt_iocwr_have_ownership_allocating: 0x0  owt_wb_or_l3_eviction_data_received_allocating: 0x0  owt_wb_or_l3_eviction_data_received_no_allocation: 0x0  owt_wb_or_l3_eviction_waiting_data: 0x0  owt_outstanding_ioc_l4_req: 0x0  owt_outstanding_ownership_l4_req: 0x0  ort_l4_req_outstanding: 0x0  ort_received_or_have_ownership: 0x1  owt_blocking_tail: 0x0  owt_blocking_head: 0x0  art_l3_eviction_blocking_tail: 0x0  art_l3_eviction_blocking_head: 0x0  art_esnoop_blocking_tail: 0x0  art_esnoop_blocking_head: 0x0  art_l2dir_eviction_blocking_tail: 0x0  art_l2dir_eviction_blocking_head: 0x0  art_non_eviction_blocking_tail: 0x0  art_non_eviction_blocking_head: 0x0  ort_blocking_tail: 0x0  ort_blocking_head: 0x0  ort_dealloc_head: 0x0  companion_way2_eviction: 0x0  companion_way1_eviction: 0x0  rar_is_supplant: 0x0  eat_hit_shared: 0x0  eat_hit: 0x0  replay_src: scf_scc_replay_src_t_ART  propagated_ecc_mismatch: 0x0  mdir_ecc_mismatch: 0x0  ctag_ecc_mismatch: 0x0  owt_ecc_replay_set: 0x0  ort_ecc_replay_set: 0x0  art_ecc_replay_set: 0x0  ecc_replay: 0x0  mpam_ctag_cnt: 0x52  idx_rrpv_value_final: 0x0  idx_rrpv_value_initial: 0x0  psel_counter_final: 0x0  psel_counter_initial: 0x0  brrip_non_tracked_counter_final: 0x0  brrip_non_tracked_counter_initial: 0x0  brrip_tracked_counter_final: 0x0  brrip_tracked_counter_initial: 0x0  scc_default_rrpv: 0x0  lut_default_rrpv: 0x0  scc_drrip_policy: scf_scc_ctag_replacement_policy_t_SRRIP  lut_drrip_policy: scf_scc_ctag_replacement_policy_t_SRRIP  mpam_qpc: scf_scc_qpc_t_H  ctag_reserved_by_owt: 0x0  ctag_reserved_by_ort: 0x0  vdir_reserved_by_owt: 0x0  vdir_reserved_by_ort: 0x0  mdir_reserved_by_owt: 0x0  mdir_reserved_by_ort: 0x0  owt_ncm_cbusy: 0x1  ort_ncm_cbusy: 0x1  mpam_cbusy: 0x0  mpam_cbusy_peer: 0x0  cbusy_peer: 0x0  is_likely_serviced_internally: 0x1  flush_read_req_flush_engine_id: 0x0  is_rar_optimized: 0x0  fcm_spec_read_issued_from_cex3: 0x0  fcm_spec_read_issued_from_cex2: 0x0  local_only_count: 0x2a47  remote_only_count: 0x0  vdir_way_reserved: 0x0  mdir_way_reserved: 0x0  evict_addr: 0x1000fe8cc00  evict_ns: 0x0  evict_nse: 0x0  evict_dir_presence: 0x10080000000000000000001f0  mdir_update: 0x0  vdir_update: 0x0  vc_id: scf_scc_csn_req_vc_t_INB_LOCAL_CPU_LL  mdir_to_vdir_move_valid: 0x0  mdir_wr_valid: 0x0  vdir_wr_valid: 0x0  ctag_rip_wr_valid: 0x0  ctag_wr_valid: 0x0  flush_in_progress: 0x0  mdir_to_vdir: 0x0  vdir_evict: 0x0  vdir_alloc: 0x0  vdir_hit: 0x0  vdir_index: 0x0  mdir_updated_way2_presence: 0x800000  mdir_updated_way1_presence: 0x0  mdir_updated_way0_presence: 0x2  mdir_update_way2_valid: 0x0  mdir_update_way1_valid: 0x0  mdir_update_way0_valid: 0x0  mdir_hit2_presence: 0x800000  mdir_hit1_presence: 0x0  mdir_hit0_presence: 0x2  mdir_hit2_presence_type: scf_scc_mdir_presence_type_t_FLAT_THIRD_PART  mdir_hit1_presence_type: scf_scc_mdir_presence_type_t_FLAT_SECOND_PART  mdir_hit0_presence_type: scf_scc_mdir_presence_type_t_FLAT_FIRST_PART  mdir_alloc: 0x0  mdir_hit2: 0x1  mdir_way2: 0x400  mdir_hit1: 0x1  mdir_way1: 0x200  mdir_hit0: 0x1  mdir_way0: 0x100  mdir_index: 0x9b  flush_dest: flush_read_req_dest_t_MDIR  dfd_dest: 0x0  is_dfd_write: 0x0  is_dfd_read: 0x0  is_flush_read: 0x0  is_decerr: 0x0  is_from_haq: 0x0  is_haq_pushed: 0x0  is_ncm_retry: 0x0  haq_based_retry: 0x0  mpam_based_retry: 0x0  ncm_based_retry: 0x0  is_retry_nuked: 0x0  prefetch_drop: 0x1  is_replay_after_wakeup: 0x0  iso_kill_sleep: 0x0  iso_pseudo_sleep: 0x0  hit_owt_id: 0x27  iso_hit_owt_head: 0x0  hit_ort_id: 0xxx  iso_hit_ort_head: 0x0  hit_art_id: 0x0  iso_hit_art_head: 0x0  owt_hit: 0x0  ort_hit_is_from_move: 0x0  ort_hit: 0x0  art_hit: 0x0  l2dir_hit: 0x1  is_ephemeral_hit: 0x0  ctag_hit_final: 0x1  ctag_hit: 0x1  l2dir_eviction: 0x0  ctag_eviction_is_gv: 0x0  ctag_eviction_state_is_unique_dirty: 0x0  ctag_eviction_state_is_unique_clean: 0x1  ctag_eviction_state_is_shared: 0x0  ctag_globally_visible: 0x0  ctag_unused_prefetch: 0x0  ctag_alloc_ways_valid_and_not_resvd_and_inactive: 0x7  ctag_alloc_ways_not_valid_and_not_resvd_and_inactive: 0xfff8  ctag_capacity_or_coherency: 0x0  ctag_in_dir_eviction: 0x0  ctag_dirty_eviction: 0x0  ctag_silent_eviction: 0x0  ctag_final_state: scf_scc_ctag_state_t_UD  ctag_initial_state: scf_scc_ctag_state_t_UD  ctag_hashed_index: 0x13e  ctag_way: 0x1  eat_alloc: 0x0  cgid: 0x27  ctag_alloc_valid: 0x1  ctag_alloc_set: 0x1  ctag_alloc: 0x0  esnoop_wakeup_replay_deferred: 0x0  esnoop_wakeup_replay: 0x0  ctag_check_replay: 0x0  dir_check_replay: 0x0  ctag_eviction_wakeup_replay: 0x0  ctag_replay: 0x0  dir_replay: 0x0  paq_replay: 0x0  full_replay: 0x0  ort_ort_chain: 0x0  art_owt_chain: 0x0  art_ort_chain: 0x0  art_art_chain: 0x0  paq_killed: 0x0  cam_killed: 0x0  killed: 0x0  final_dir_presence: 0x1008000000000000000000002  initial_dir_presence: 0x1008000000000000000000002  initial_l2c_state: scf_scc_dir_l2c_state_t_SHARED  initial_scf_state: scf_scc_mdir_block_scf_state_t_UNIQUE  l2c_state: scf_scc_dir_l2c_state_t_SHARED  scf_state: scf_scc_mdir_block_scf_state_t_UNIQUE  owd_id2: 0x27  owd_id1: 0x1  owd_id0: 0x0  owt_id_secondary: 0x1  secondary_owt_valid: 0x1  owt_id_primary: 0x0  primary_owt_valid: 0x1  ort_valid: 0x1  ort_id: 0x0  art_id_secondary: 0x1  secondary_art_valid: 0x1  art_id_primary: 0x0  primary_art_valid: 0x1  is_full_replay: 0x0  is_full_play: 0x1  is_valid: 0x1  evictandalloc_nop: 0x0  } reset_: 0x1  }'''

grammar = r"""
?start: maybe_space starting nested_capture

nested_capture : open_curly_brace (key_value)+ close_curly_brace

maybe_space : (" "|/\t/)*

open_curly_brace : "{" maybe_space
close_curly_brace : "}" maybe_space
open_brace : "(" maybe_space
close_brace : ")" maybe_space
colon_symbol : ":" maybe_space

anything :  /.+?/

key_value : KEY colon_symbol value_space

KEY : CNAME

TIME : INT

TXN_NAME : CNAME "_uvc::transaction"

txn_name_block : open_brace TXN_NAME anything+ close_brace maybe_space

VERBOSITY_LEVEL : ("H"|"F")

starting : TIME maybe_space colon_symbol VERBOSITY_LEVEL maybe_space colon_symbol anything+ colon_symbol "t" colon_symbol txn_name_block maybe_space

HEX_NUM : ("0x"? (HEXDIGIT|"x")+)

value_space : value maybe_space

value: CNAME
     | HEX_NUM
     | nested_capture

%import common (WORD, HEXDIGIT, NUMBER, ESCAPED_STRING, WS, CNAME, INT)
# %ignore WS
"""
def main():# {{{
    parser = Lark(grammar, 
                  parser="earley", 
                  # parser="cyk", 
                  # parser="lalr", 
                  keep_all_tokens=True,
                  maybe_placeholders=True,
                  start="start")
    tree = parser.parse(text)
    tree = parser.lex(text, dont_ignore=True)
    print(list(tree))

I was hoping the starting of the string to show up as TIME but showing up as HEX_NUM also some of the CNAME would show up as KEY. Is there some construct i am supposed to use ?

TIA

erezsh commented 2 months ago

Sorry you didn't get any response. If it's still relevant, I suggest trying stackoverflow.